Re: Access log to see where robots go.

Mark Hagger Sat, 11 Feb 2006 09:20:00 -0800

robots.txt is a standard file that search engines should request before trying 
to index your site.  Its allows you to block the indexer completely, or 
partially from your site.  Try a google search for "robots.txt" for more 
details.


Not having one is the same as saying "feel free to index my entire site", so 
in your case thats not causing any problems.

Mark


On Saturday 11 February 2006 16:57, Ed Bicker wrote:
> Hello Scott,
> I have had similar problem. Can you let me know if this is resolved on your
> end. Sometimes the email response coming back to me gets buried in another
> folder and I never get to see the resolutions.
> I can't seem to get search engines to see my site, as well. I do not know
> how to resolve this....
>
> Thanks
> Ed
> [EMAIL PROTECTED]
>
>
> -----Original Message-----
> From: Scott Purcell [mailto:[EMAIL PROTECTED]
> Sent: Friday, February 10, 2006 8:40 PM
> To: Tomcat Users List
> Subject: Access log to see where robots go.
>
>
> I have had trouble getting search engines to see my site. I built it with
> struts, and use some tags from the index.html page to get business logic,
> to finally get to my page. The url is http://www.theuniquepear.com
>
> Anyway, upon talking to some co-workers, they suggested I watch my access
> log, so I can see what files they are indexing. I thought I had the access
> log turned on for the site, and see when someone hits my web site, but as
> far as the searchbots go, I only see this in my logs daily.
>
> $ cat  localhost_access_log.2006-02-07.txt | less
> 67.15.16.30 - - [07/Feb/2006:03:44:55 -0600] "GET /robots.txt HTTP/1.0" 404
> 985
> 67.15.16.30 - - [07/Feb/2006:03:46:21 -0600] "GET / HTTP/1.0" 200 844
> 67.15.16.30 - - [07/Feb/2006:03:51:57 -0600] "GET /robots.txt HTTP/1.0" 404
> 985
> 62.114.208.233 - - [07/Feb/2006:03:52:42 -0600] "GET
> /unique/welcome.do?OVRAW=home%20decorating%20ideas&OVKEY=home
> 62.114.208.233 - - [07/Feb/2006:03:52:44 -0600] "GET
> /unique/includes/siteWide.css HTTP/1.1" 200 15402
> 62.114.208.233 - - [07/Feb/2006:03:52:44 -0600] "GET
> /unique/images/header_pear.jpg HTTP/1.1" 200 11227
>
>
> I see the entry for robots.txt, but I have no idea where they are going, or
> what they are doing.
>
> I turned on access log like this in the server.xml like so:
>         <Valve className="org.apache.catalina.valves.AccessLogValve"
>                  directory="logs"  prefix="localhost_access_log."
> suffix=".txt"
>                  pattern="common" resolveHosts="false"/>
>
> And that is a snippet of the log from above.
>
> Does anyone know how to get more involved text, or can anyone tell me what
> the robots.txt above is doing?
>
>
> Thanks,
> Scott
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
> ________________________________________________________________________
> This email has been scanned for all known viruses by the MessageLabs
> SkyScan service.

________________________________________________________________________
This email has been scanned for all known viruses by the MessageLabs SkyScan 
service.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Access log to see where robots go.

Reply via email to