I've only tried one or two web spiders, and I wasn't terribly impressed with either. That said, there are probably a hundred web spiders on sourceforge. search for 'web spider' and make sure you click the box that says you want ALL words (it does an 'OR' by default)
If you are just particularly interested in CRFH, then maybe you want the CRFH Archive Project. You can also search for 'web comics' and find hundreds of specialized comic spiders.
I still do it the old fashioned way. I've been meaning to get around to making myself an internet portal homepage that would have a comic-spider back end. That way, whenever I started up my browser, I could get to a single page with all of my comics with just one click. It hasn't been a high priority though, as I have 3 Megabit DSL right now, so the old fashioned way is pretty durn fast.
Wed, 18 Aug 2004 17:20:00 EDT (-0400) Even if you weren't terribly impressed with them, would they still meet my minimum requirements of downloading the contents of a website unattended and refraining from crashing (Windows 98) until they've done that? Which would be the best, or least objectionable of the one or two that you tried? It would be nice to have that URL matching feature, if possible. And of course freeware would be nice. -Jim
I've used the HTTrack Website Copier built into Spiderzilla (a website downloader that you can load as a mozilla plugin) and another called, IIRC 'black widow' or something similar. Both would meet your criteria.
As for freeware, both are, as is everything on sourceforge.
Sat, 21 Aug 2004 20:31:00 EDT (-0400) I tried one called weblech, which I found at http://sourceforge.net/projects/weblech/, and it was pretty cool. There are some bugs somewhere, and it doesn't download my website (http://www.geocities.com/jameswi.geo) completely before stopping, so it doesn't quite meet my minimum requirements. It's still pretty cool though, and looked like it would have downloaded the whole crfh site (it was busily downloading the comics when I finally stopped it). I found some interesting content that I wouldn't have found myself. On my site it didn't find and download my http://www.geocities.com/jameswi.geo/FandSF/PAnderson/PAnderson.htm pages, even though they are linked. It might have been a bug with the search level, although I set it to 0 (infinite depth) -- or maybe a problem with case sensitivity. It's an older project, dated June 2002. I didn't get the promised GUI with the download. It didn't stop when I pressed any key, as advertised. Sometimes I had to help it along by pressing Enter at startup, and by manually creating the local directory to which it copies files; once it tried to create this itself but failed to mark it as a folder and then couldn't download any content after that. It has an innovative "interesting URL" field you can set, but I was disappointed with the inherent limitations of setting the include string for desired URLs to be downloaded -- it seems that if I specify the comics' image URL pattern, it won't go to the containing HTML page in the first place, and if I specify the HTML page URL pattern, that won't match the image URL pattern. The "interesting URL" field didn't seem to have much affect; maybe I could have tried putting only my pattern in there and deleting the others.
If I have more time I can try something else, like one of those dedicated comics spiders you mentioned. -Jim
Wed, 18 Aug 2004 19:19:00 (-0400) I just finished reading through the entire CRFH series (http://www.crfh.net). Amusing. I liked the one where Mike read the cough medicine label with the list of horrendous side-effects, chugged the whole thing down, and tossed the bottle away, saying "Minty." Also, poor April needs a boyfriend, a life, and some superpowers. At least Margaret has some training and a would-be. So what is there to look at on the CRFH Archive project? -Jim
Thu, 19 Aug 2004 10:19:00 EDT (-0400) Thanks for the info. Wow, my memory certainly is playing tricks on me. I could have sworn that CRFH was one of the webcomics that you listed, but when I checked back at Feb 27 at http://www.livejournal.com/users/swestrup/135164.html sure enough, it's not included. Maybe it's time for a dose of Geritol (However, those who remember *that* product, are probably simultaneously old enough for it, but not in need of it!). -Jim
no subject
Date: 2004-08-17 07:25 pm (UTC)search for 'web spider' and make sure you click the box that says you want ALL words (it does an 'OR' by default)
If you are just particularly interested in CRFH, then maybe you want the CRFH Archive Project. You can also search for 'web comics' and find hundreds of specialized comic spiders.
no subject
Date: 2004-08-18 06:46 am (UTC)no subject
Date: 2004-08-18 07:21 am (UTC)no subject
Date: 2004-08-18 02:20 pm (UTC)Even if you weren't terribly impressed with them, would they
still meet my minimum requirements of downloading the contents
of a website unattended and refraining from crashing (Windows 98)
until they've done that? Which would be the best, or least
objectionable of the one or two that you tried? It would be
nice to have that URL matching feature, if possible. And of
course freeware would be nice.
-Jim
no subject
Date: 2004-08-18 11:50 pm (UTC)As for freeware, both are, as is everything on sourceforge.
no subject
Date: 2004-08-21 05:31 pm (UTC)I tried one called weblech, which I found at
http://sourceforge.net/projects/weblech/, and it was pretty cool.
There are some bugs somewhere, and it doesn't download my website
(http://www.geocities.com/jameswi.geo) completely before
stopping, so it doesn't quite meet my minimum requirements. It's
still pretty cool though, and looked like it would have
downloaded the whole crfh site (it was busily downloading the
comics when I finally stopped it). I found some interesting
content that I wouldn't have found myself. On my site it didn't
find and download my
http://www.geocities.com/jameswi.geo/FandSF/PAnderson/PAnderson.htm
pages, even though they are linked. It might have been a bug with
the search level, although I set it to 0 (infinite depth) -- or
maybe a problem with case sensitivity. It's an older project,
dated June 2002. I didn't get the promised GUI with the download.
It didn't stop when I pressed any key, as advertised. Sometimes I
had to help it along by pressing Enter at startup, and by
manually creating the local directory to which it copies files;
once it tried to create this itself but failed to mark it as a
folder and then couldn't download any content after that. It has
an innovative "interesting URL" field you can set, but I was
disappointed with the inherent limitations of setting the include
string for desired URLs to be downloaded -- it seems that if I
specify the comics' image URL pattern, it won't go to the
containing HTML page in the first place, and if I specify the
HTML page URL pattern, that won't match the image URL pattern.
The "interesting URL" field didn't seem to have much affect;
maybe I could have tried putting only my pattern in there and
deleting the others.
If I have more time I can try something else, like one of those
dedicated comics spiders you mentioned.
-Jim
no subject
Date: 2004-08-18 04:18 pm (UTC)I just finished reading through the entire CRFH series
(http://www.crfh.net). Amusing. I liked the one where Mike read
the cough medicine label with the list of horrendous
side-effects, chugged the whole thing down, and tossed the bottle
away, saying "Minty." Also, poor April needs a boyfriend, a life,
and some superpowers. At least Margaret has some training and a
would-be.
So what is there to look at on the CRFH Archive project?
-Jim
no subject
Date: 2004-08-18 11:52 pm (UTC)no subject
Date: 2004-08-19 07:18 am (UTC)Thanks for the info.
Wow, my memory certainly is playing tricks on me. I could have
sworn that CRFH was one of the webcomics that you listed, but
when I checked back at Feb 27 at
http://www.livejournal.com/users/swestrup/135164.html
sure enough, it's not included. Maybe it's time for a dose of
Geritol (However, those who remember *that* product, are probably
simultaneously old enough for it, but not in need of it!).
-Jim
no subject
Date: 2004-08-23 08:54 am (UTC)I can recommend CRFH. It grows on you. The graphic artwork
improves drastically from its beginning.
-Jim