Re: SOT but... any one using a bot trap?

2007-11-02 Thread Tom Chiverton
On Thursday 01 Nov 2007, Claude Schneegans wrote: Good, but apparently their site is closed for the time being. Seems to be running here. -- Tom Chiverton Helping to advantageously bully ubiquitous e-business on: http://thefalken.livejournal.com

Re: SOT but... any one using a bot trap?

2007-11-01 Thread Tom Chiverton
On Wednesday 31 Oct 2007, [EMAIL PROTECTED] wrote: Apparently they publish this very simple to parse black list every day. Project Honey Pot have a DNS-based blacklist system too. You construct a hostname based on your API key, the IP to query, and standard TLD, and if it resolves you don't let

Re: SOT but... any one using a bot trap?

2007-11-01 Thread Claude Schneegans
Project Honey Pot have a DNS-based blacklist system too. Good, but apparently their site is closed for the time being. It their black list system still working? -- ___ REUSE CODE! Use custom tags; See http://www.contentbox.com/claude/customtags/tagstore.cfm

Re: SOT but... any one using a bot trap?

2007-10-31 Thread Tom Chiverton
On Saturday 27 Oct 2007, [EMAIL PROTECTED] wrote: So is it really working ? I use Project Honey Pot's service (hide some bot trap links in pages, and then they look for spam coming from people who visited those links, subtracting known good IPs). I have my own DNS so have donated an MX record

Re: SOT but... any one using a bot trap?

2007-10-31 Thread Claude Schneegans
I use Project Honey Pot's service Thanks, I'll sure have a look. Though my own system is well enough advanced now. - I automatically detect robots when they fall in the trap. - known robots will only see text, no image is displayed. - I can verify the host and if anything looks suspicious, flag

RE: SOT but... any one using a bot trap?

2007-10-29 Thread Dave Watts
I don't see the problem, users with Web accelerator get their stuff from Google's cache, not from my server, so I don't even hear about them. Anyway, our sites are dynamic, we publish news every day, so robots are asked to not use cache anyway. No, Google Web Accelerator doesn't rely on

Re: SOT but... any one using a bot trap?

2007-10-29 Thread Claude Schneegans
No, Google Web Accelerator doesn't rely on Google's cache, it prefetches links from your server. Are you sure about that ? It uses client software installed on the user's computer, as well as data caching on Google's servers... ( http://en.wikipedia.org/wiki/Google_Web_Accelerator ) Sending

RE: SOT but... any one using a bot trap?

2007-10-28 Thread Dave Watts
If it gets to a page hrefed on a 1 pixel blank image, it cannot be a human browser. Sure it can. I can think of two examples off the top of my head - someone with Google Web Accelerator installed, or someone using Lynx. Dave Watts, CTO, Fig Leaf Software http://www.figleaf.com/ Fig Leaf

RE: SOT but... any one using a bot trap?

2007-10-28 Thread Dave Watts
Who needs to be scanned by oddities like disco/Nutch-0.9 (experimental crawler; [EMAIL PROTECTED]) Nutch is part of Lucene, I think. So, you may well need to be scanned by that. Dave Watts, CTO, Fig Leaf Software http://www.figleaf.com/ Fig Leaf Software provides the highest caliber

Re: SOT but... any one using a bot trap?

2007-10-28 Thread Claude Schneegans
Nutch is part of Lucene, I think. So, you may well need to be scanned by that. I'll make my mind when they'll put a correct web address in their user agent, and I'll be able to see by myself why they are crawling my sites. the word experimental and just an email address doesn't look too

Re: SOT but... any one using a bot trap?

2007-10-28 Thread Claude Schneegans
someone with Google Web Accelerator installed, I don't see the problem, users with Web accelerator get their stuff from Google's cache, not from my server, so I don't even hear about them. Anyway, our sites are dynamic, we publish news every day, so robots are asked to not use cache anyway.

Re: SOT but... any one using a bot trap?

2007-10-27 Thread James Holmes
Damn, that's a problem. Have you confirmed that the robots have requested and successfully retrieved robots.txt (perhaps search the logs for the webserver)? On 10/27/07, Claude Schneegans [EMAIL PROTECTED] wrote: Hi, I tried to implement a bad robot trap, I mean those that do not honor the

Re: SOT but... any one using a bot trap?

2007-10-27 Thread Claude Schneegans
Have you confirmed that the robots have requested and successfully retrieved robots.txt (perhaps search the logs for the webserver)? No, I do not trace reading of robots.txt. In principle good robot should read and honor it. Obviously, there is no absolutely good robot. I use Copernic for

RE: SOT but... any one using a bot trap?

2007-10-27 Thread Dave Watts
But making a search on the string This page was illegitimately indexed reveals that most legitimate robots have found it: Netsacpe, Google, AOL, Compuserve,... you name it. So is it really working ? IMO it is not safe to ban any robot on that only basis. I can tell you with absolute

Re: SOT but... any one using a bot trap?

2007-10-27 Thread Claude Schneegans
I can tell you with absolute certainty that Google obeys robots.txt. I'm pretty sure they do. But we all know that sometimes, an HTTP request is lost somewhere in the cyber space. If for any reason the robot does not receive the file, it will probably act as if there is none. Only once will

RE: SOT but... any one using a bot trap?

2007-10-27 Thread Bobby Hartsfield
: Saturday, October 27, 2007 5:55 PM To: CF-Talk Subject: Re: SOT but... any one using a bot trap? I can tell you with absolute certainty that Google obeys robots.txt. I'm pretty sure they do. But we all know that sometimes, an HTTP request is lost somewhere in the cyber space. If for any reason

Re: SOT but... any one using a bot trap?

2007-10-27 Thread Claude Schneegans
Define 'bad'. - bots that disobey robots.txt, - bots that do not even offer any search service for visitors searching for you, useless bots, - bots that just harvest images (just Google picscout AND gettyImages) and steal a huge amount of your bandwidth, If I was a 'bad' bot and you blocked

SOT but... any one using a bot trap?

2007-10-26 Thread Claude Schneegans
Hi, I tried to implement a bad robot trap, I mean those that do not honor the robots.txt file. Here is the robots.txt file: User-agent: * Disallow: /noBots.cfm Disallow: /bulleltin Disallow: /admin in /noBots.cfm, for the time being I just display this: This page was illegitimately indexed by