I have had trouble getting search engines to see my site. I built it with 
struts, and use some tags from the index.html page to get business logic, to 
finally get to my page. The url is http://www.theuniquepear.com

Anyway, upon talking to some co-workers, they suggested I watch my access log, 
so I can see what files they are indexing. I thought I had the access log 
turned on for the site, and see when someone hits my web site, but as far as 
the searchbots go, I only see this in my logs daily.

$ cat  localhost_access_log.2006-02-07.txt | less
67.15.16.30 - - [07/Feb/2006:03:44:55 -0600] "GET /robots.txt HTTP/1.0" 404 985
67.15.16.30 - - [07/Feb/2006:03:46:21 -0600] "GET / HTTP/1.0" 200 844
67.15.16.30 - - [07/Feb/2006:03:51:57 -0600] "GET /robots.txt HTTP/1.0" 404 985
62.114.208.233 - - [07/Feb/2006:03:52:42 -0600] "GET 
/unique/welcome.do?OVRAW=home%20decorating%20ideas&OVKEY=home
62.114.208.233 - - [07/Feb/2006:03:52:44 -0600] "GET 
/unique/includes/siteWide.css HTTP/1.1" 200 15402
62.114.208.233 - - [07/Feb/2006:03:52:44 -0600] "GET 
/unique/images/header_pear.jpg HTTP/1.1" 200 11227


I see the entry for robots.txt, but I have no idea where they are going, or 
what they are doing.

I turned on access log like this in the server.xml like so:
        <Valve className="org.apache.catalina.valves.AccessLogValve"
                 directory="logs"  prefix="localhost_access_log." suffix=".txt"
                 pattern="common" resolveHosts="false"/>

And that is a snippet of the log from above.

Does anyone know how to get more involved text, or can anyone tell me what the 
robots.txt above is doing?


Thanks,
Scott

Reply via email to