[Robots] Re: Perl and LWP robots

2002-03-08 Thread Matthias Jaekle
Hello >The usefulness of the single-host spiders is pretty obvious to me. >But why do people want to write spiders that potentially span all/any hosts? >(Aside from people who are working for Google or similar.) > Maybe the better question is, why do people want to write spiders that span many

[Robots] Re: Perl and LWP robots

2002-03-08 Thread Alex McLintock
--- Avi Rappoport <[EMAIL PROTECTED]> wrote: > > At 3:43 PM -0700 3/7/02, Sean M. Burke wrote: > >The usefulness of the single-host spiders is pretty obvious to me. > >But why do people want to write spiders that potentially span all/any hosts? > >(Aside from people who are working for Google o

[Robots] Re: Perl and LWP robots

2002-03-07 Thread Tim Bray
At 11:31 AM 07/03/02 -0800, Nick Arnett wrote: >> * Write it in Perl (or equivalent). > >I suppose it doesn't help with a book on Perl, but I'm re-writing my robots >in Python and I'm very happy with the way it's going. I consider Python to fall under "or equivalent" :) >> * Consider

[Robots] Re: Perl and LWP robots

2002-03-07 Thread B Leong
People write spiders that potentially span all/any hosts to harvest those email addresses for the annoying spam, to see if trademarks are being used illegally, to see if copyrights are being violated, etc. > The replies to my request for advice have been very helpful! I'll pick one > and reply

[Robots] Re: Perl and LWP robots

2002-03-07 Thread Avi Rappoport
At 3:43 PM -0700 3/7/02, Sean M. Burke wrote: >The usefulness of the single-host spiders is pretty obvious to me. >But why do people want to write spiders that potentially span all/any hosts? >(Aside from people who are working for Google or similar.) People think a robot can be an intelligent a

[Robots] Re: Perl and LWP robots

2002-03-07 Thread Sean M. Burke
The replies to my request for advice have been very helpful! I'll pick one and reply to it: At 10:01 2002-03-07 -0800, Otis Gospodnetic wrote: >[about my forthcoming book] >(i.e. I'm a potential customer :)) When will it be published? It's probably going into tech edit later this month. So i

[Robots] Re: Perl and LWP robots

2002-03-07 Thread Nick Arnett
> -Original Message- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On > Behalf Of Tim Bray [snip] > * Write it in Perl (or equivalent). I suppose it doesn't help with a book on Perl, but I'm re-writing my robots in Python and I'm very happy with the way it's going. Perform

[Robots] Re: Perl and LWP robots

2002-03-07 Thread Klaus Johannes Rusch
In <[EMAIL PROTECTED]>, "Sean M. Burke" <[EMAIL PROTECTED]> writes: > Aside from basic concepts (don't hammer the server; always obey the > robots.txt; don't span hosts unless you are really sure that you want to), > are there any particular bits of wisdom that list members would want me to > pa

[Robots] Re: Perl and LWP robots

2002-03-07 Thread Avi Rappoport
I've found that image maps, framesets, redirects, funky relative links, JavaScript links and dynamic URLs generated from backend systems are the main problems with robots. Also bad HTML on pages so the robot gets confused parsing it, such as unclosed tags. I have written up a checklist for

[Robots] Re: Perl and LWP robots

2002-03-07 Thread Matthew Meadows
That's a curious remark about readers and their misplaced desire for recursive spiders. A recursive spider allows its user to drill down into a particular information domain and ultimately exhaust it if the spider is capable enough. This is of enormous benefit to the information researcher look

[Robots] Re: Perl and LWP robots

2002-03-07 Thread Michael Lange
Hi Sean, You might want to consider exploring the "not yet approved" updated robots.txt standard that covers allow rules and how to apply them to your spider. This may help raise the level of awareness on the robots.txt standard. You could also talk about how to use the robots.txt with your spid

[Robots] Re: Perl and LWP robots

2002-03-07 Thread Otis Gospodnetic
Excellent. I have a copy of Wong's book at home and like that topic (i.e. I'm a potential customer :)) When will it be published? I think lots of people do want to know about recursive spiders, and I bet one of the most frequent obstacles are issues like: queueing, depth vs. breadth first crawl

[Robots] Re: Perl and LWP robots

2002-03-07 Thread Chris Skepper
> Aside from basic concepts (don't hammer the server; always obey the > robots.txt; don't span hosts unless you are really sure that you want to), > are there any particular bits of wisdom that list members would want me to > pass on to my readers? Look at http://www.robotstxt.org/wc/guidelin

[Robots] Re: Perl and LWP robots

2002-03-07 Thread Tim Bray
At 02:51 AM 07/03/02 -0700, Sean M. Burke wrote: >Aside from basic concepts (don't hammer the server; always obey the >robots.txt; don't span hosts unless you are really sure that you want to), >are there any particular bits of wisdom that list members would want me to >pass on to my readers?