Sat, 21 Aug 2004 20:31:00 EDT (-0400) I tried one called weblech, which I found at http://sourceforge.net/projects/weblech/, and it was pretty cool. There are some bugs somewhere, and it doesn't download my website (http://www.geocities.com/jameswi.geo) completely before stopping, so it doesn't quite meet my minimum requirements. It's still pretty cool though, and looked like it would have downloaded the whole crfh site (it was busily downloading the comics when I finally stopped it). I found some interesting content that I wouldn't have found myself. On my site it didn't find and download my http://www.geocities.com/jameswi.geo/FandSF/PAnderson/PAnderson.htm pages, even though they are linked. It might have been a bug with the search level, although I set it to 0 (infinite depth) -- or maybe a problem with case sensitivity. It's an older project, dated June 2002. I didn't get the promised GUI with the download. It didn't stop when I pressed any key, as advertised. Sometimes I had to help it along by pressing Enter at startup, and by manually creating the local directory to which it copies files; once it tried to create this itself but failed to mark it as a folder and then couldn't download any content after that. It has an innovative "interesting URL" field you can set, but I was disappointed with the inherent limitations of setting the include string for desired URLs to be downloaded -- it seems that if I specify the comics' image URL pattern, it won't go to the containing HTML page in the first place, and if I specify the HTML page URL pattern, that won't match the image URL pattern. The "interesting URL" field didn't seem to have much affect; maybe I could have tried putting only my pattern in there and deleting the others.
If I have more time I can try something else, like one of those dedicated comics spiders you mentioned. -Jim
no subject
Date: 2004-08-21 05:31 pm (UTC)I tried one called weblech, which I found at
http://sourceforge.net/projects/weblech/, and it was pretty cool.
There are some bugs somewhere, and it doesn't download my website
(http://www.geocities.com/jameswi.geo) completely before
stopping, so it doesn't quite meet my minimum requirements. It's
still pretty cool though, and looked like it would have
downloaded the whole crfh site (it was busily downloading the
comics when I finally stopped it). I found some interesting
content that I wouldn't have found myself. On my site it didn't
find and download my
http://www.geocities.com/jameswi.geo/FandSF/PAnderson/PAnderson.htm
pages, even though they are linked. It might have been a bug with
the search level, although I set it to 0 (infinite depth) -- or
maybe a problem with case sensitivity. It's an older project,
dated June 2002. I didn't get the promised GUI with the download.
It didn't stop when I pressed any key, as advertised. Sometimes I
had to help it along by pressing Enter at startup, and by
manually creating the local directory to which it copies files;
once it tried to create this itself but failed to mark it as a
folder and then couldn't download any content after that. It has
an innovative "interesting URL" field you can set, but I was
disappointed with the inherent limitations of setting the include
string for desired URLs to be downloaded -- it seems that if I
specify the comics' image URL pattern, it won't go to the
containing HTML page in the first place, and if I specify the
HTML page URL pattern, that won't match the image URL pattern.
The "interesting URL" field didn't seem to have much affect;
maybe I could have tried putting only my pattern in there and
deleting the others.
If I have more time I can try something else, like one of those
dedicated comics spiders you mentioned.
-Jim