What I do is look at the IP addresses that access robots.txt and consider any similar IP address to be a robot. "Similar" means the same first 3 bytes - often the requests for pages come from a different machine than the one that checked robots.txt. It's not perfect but it works pretty well.
You should also check the User-Agent, as others have suggested. -- Len On 3/27/06, Scott Purcell <[EMAIL PROTECTED]> wrote: > I really would like to find out how my (and if my site) is being > indexed. I am using Tomcat 5.5 and I am running a ecommerce site. I have > had nothing but trouble getting seen in search engines, so I would like > to be able to somehow trace what pages robots are indexing. > > > > I did add a robots.txt and allow my whole domain to be indexed. So that > being said, I added an access log but by default, I cannot really tell > if I have users hitting my site, or when robots are visiting my site. > > > > Below is my access_log from Saturday. What I would like to be able to > do, is distinguish between normal users, and robots. If I can verify > that the robots are indexing my site, I would be happy, and then know I > need to work more meta. But if they are not indexing my site, then I > need to find out why. > > > > Does this make sense? Has anyone been through this before? Thanks ahead > of time, > > Regards > > Scott --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]