[Robots] Perl and LWP robots

2002-03-07 Thread Sean M. Burke
on't hammer the server; always obey the robots.txt; don't span hosts unless you are really sure that you want to), are there any particular bits of wisdom that list members would want me to pass on to my readers? -- Sean M. Burke[EMAIL PROTECTED]http://www.spinn.net/~sburke/

[Robots] Re: Perl and LWP robots

2002-03-07 Thread Sean M. Burke
u/~sburke/pub/"; or "http://www.";), and kinds that go hog wide across all of the Web. The usefulness of the single-host spiders is pretty obvious to me. But why do people want to write spiders that potentially span all/any hosts? (Aside from people who are working for Goo

[Robots] matching and "UserAgent:" in robots.txt

2002-03-14 Thread Sean M. Burke
eans that every robot ID string has to appear in toto on the "User-Agent" robots.txt line, which is clearly a bad thing. But before I submit a patch, I'm tempted to ask... what /is/ the proper behavior? Maybe shave the current user-agent's name at the first slash or space

[Robots] Re: matching and "UserAgent:" in robots.txt

2002-03-14 Thread Sean M. Burke
Oops, I just noticed that my topic has "UserAgent:" where I meant "User-Agent:" -- Sean M. Burke[EMAIL PROTECTED]http://www.spinn.net/~sburke/ -- This message was sent by the Internet robots and spiders discussion list ([EMAIL PROTECTED]). For list server comm

[Robots] Re: matching and "UserAgent:" in robots.txt

2002-03-14 Thread Sean M. Burke
At 12:47 2002-03-14 +0100, Martin Beet wrote: > On Thu, 14 Mar 2002 03:08:21 -0700, Sean M Burke (SMB) said >SMB> I'm a bit perplexed over whether the current Perl library >SMB> WWW::RobotRules implements a certain part of the Robots Exclusion >SMB> Standard correct

[Robots] Re: matching and "UserAgent:" in robots.txt

2002-03-14 Thread Sean M. Burke
of the list on this before bringing it up with the others, tho. -- Sean M. Burke[EMAIL PROTECTED]http://www.spinn.net/~sburke/ -- This message was sent by the Internet robots and spiders discussion list ([EMAIL PROTECTED]). For list server commands, send "help" in the body

[Robots] Re: matching and "UserAgent:" in robots.txt

2002-03-15 Thread Sean M. Burke
ethod's logic says "well, it doesn't end in '/number.number', so there's no version to strip off". So I'm going to send Gisle Aas a patch so that the first word, minus any version suffix, is what's used for matching. It's just a matter of adding

[Robots] Re: better language for writing a Spider ?

2002-03-15 Thread Sean M. Burke
e, books/articles/modules to write, etc.), but we do at times manage to do what needs doing, if it's pointed out clearly enough to stand out from the torrent of email messages (which I find incessantly discouraging) that manage no better than "halo I try to use LWP with hotmel but not wo

[Robots] leading whitespace in robots.txt files

2002-03-25 Thread Sean M. Burke
one out there is using leading-whitespace lines as comments, or as RFC-822-style continuation lines! Thoughts, anyone? -- Sean M. Burke[EMAIL PROTECTED]http://www.spinn.net/~sburke/

RE: [Robots] robot in python?

2003-11-26 Thread Sean M. Burke
Sound as a sensory perception, clearly not in the interest of any individual or body or civilization, if it were possible in the first place. You talk funny! This pleases me. -- Sean M. Burkehttp://search.cpan.org/~sburke/ ___ Robots mailing list