What I do is look at the IP addresses that access robots.txt and
consider any similar IP address to be a robot. "Similar" means the
same first 3 bytes - often the requests for pages come from a
different machine than the one that checked robots.txt. It's not
perfect but it works pretty well.

You should also check the User-Agent, as others have suggested.
--
Len

On 3/27/06, Scott Purcell <[EMAIL PROTECTED]> wrote:
> I really would like to find out how my (and if my site) is being
> indexed. I am using Tomcat 5.5 and I am running a ecommerce site. I have
> had nothing but trouble getting seen in search engines, so I would like
> to be able to somehow trace what pages robots are indexing.
>
>
>
> I did add a robots.txt and allow my whole domain to be indexed. So that
> being said, I added an access log but by default, I cannot really tell
> if I have users hitting my site, or when robots are visiting my site.
>
>
>
> Below is my access_log from Saturday. What I would like to be able to
> do, is distinguish between normal users, and robots. If I can verify
> that the robots are indexing my site, I would be happy, and then know I
> need to work more meta. But if they are not indexing my site, then I
> need to find out why.
>
>
>
> Does this make sense? Has anyone been through this before? Thanks ahead
> of time,
>
> Regards
>
> Scott

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to