I have had trouble getting search engines to see my site. I built it with
struts, and use some tags from the index.html page to get business logic, to
finally get to my page. The url is http://www.theuniquepear.com
Anyway, upon talking to some co-workers, they suggested I watch my access log,
so I can see what files they are indexing. I thought I had the access log
turned on for the site, and see when someone hits my web site, but as far as
the searchbots go, I only see this in my logs daily.
$ cat localhost_access_log.2006-02-07.txt | less
67.15.16.30 - - [07/Feb/2006:03:44:55 -0600] "GET /robots.txt HTTP/1.0" 404 985
67.15.16.30 - - [07/Feb/2006:03:46:21 -0600] "GET / HTTP/1.0" 200 844
67.15.16.30 - - [07/Feb/2006:03:51:57 -0600] "GET /robots.txt HTTP/1.0" 404 985
62.114.208.233 - - [07/Feb/2006:03:52:42 -0600] "GET
/unique/welcome.do?OVRAW=home%20decorating%20ideas&OVKEY=home
62.114.208.233 - - [07/Feb/2006:03:52:44 -0600] "GET
/unique/includes/siteWide.css HTTP/1.1" 200 15402
62.114.208.233 - - [07/Feb/2006:03:52:44 -0600] "GET
/unique/images/header_pear.jpg HTTP/1.1" 200 11227
I see the entry for robots.txt, but I have no idea where they are going, or
what they are doing.
I turned on access log like this in the server.xml like so:
<Valve className="org.apache.catalina.valves.AccessLogValve"
directory="logs" prefix="localhost_access_log." suffix=".txt"
pattern="common" resolveHosts="false"/>
And that is a snippet of the log from above.
Does anyone know how to get more involved text, or can anyone tell me what the
robots.txt above is doing?
Thanks,
Scott