Update: Robots cannot read JSP?

Scott Purcell Fri, 17 Feb 2006 06:42:31 -0800

I started the below thread last weekend, and upon suggestions, I have
changed some javascript redirects to get to my site, into some JSP
redirects, based upon user input earlier this week.


In a nutshell, I am trying to make sure that robots can index my web
site. 
My web site is a struts application, and is the default app. The way the
site is configured, it is the root app, and I configured the root app to
use welcome-file as a .jsp. So when the user hits the url
<www.theuniquepear.com> it goes to a jsp page, which then does a jsp
redirect to the www.theuniquepear.com/unique/welcome.do the way struts
is set up then finally to the jsp via the action.

Due to my lack of robot understanding, if I use curl now, and just issue

curl www.theuniquepear.com it shows nothing, and does not do the
redirect.
But if I hit curl -L www.theuniquepear.com all is good and it is what I
want the robots to read.

I made the change last Monday or so, and each day I check my access log
and the only entry I see is the robots come in and get a 500 and they
are gone. When I google for my site, nothing shows up.

Does anyone know if the robots follow the links like the curl -L or does
it just use something like curl and never indexes my site? Also, what is
really silly is that even this email will probably be found when I type
in my url. Currently if one types in 'the unique pear' in google, I see
all the threads I start for this subject, but the site is never to be
found ... not good for business.

Any input would be appreciated.

Thanks,



-----Original Message-----
From: Mike Sabroff [mailto:[EMAIL PROTECTED] 
Sent: Saturday, February 11, 2006 11:09 AM
To: Tomcat Users List
Subject: Re: Robots cannot read JSP?

Scott,
Your assessment is incorrect!  First off, curl doesn't read html pages, 
it does a get or post to a url just as though you clicked it in your 
browser (and a lot of other things you can do with curl). Second off, it

is not the jsp that is the problem, it is the javascript as Tim said, 
and the lack of links.

Mike

David Smith wrote:
> I doubt the problem is with curl not being able to read files other 
> than .htm or .html. The problem is only browsers execute javascript. 
> Think of curl or the search engines as a browser without javascript 
> enabled.  What would you get in IE or Firefox if you disabled
javascript?
>
> -- David
>
> Scott Purcell wrote:
>> Tim,
>> Thanks a lot for the info. I got to thinking, and tried invoking curl
>> from my box on the url, and see exactly what you saw. The js screwing
>> things up.
>>
>> So I decided to run curl on different pages, and I came to the
>> conclusion that only htm, or html pages show up via curl?
>>
>> Does anyone think that the robots are just like curl, and that they
can
>> only read HTML files?
>>
>> Thanks for all, I know this is a bit off topic ...and I hope I don't
>> hack anyone off.
>>
>> Thanks
>> Scott
>>
>> -----Original Message-----
>> From: Tim Funk [mailto:[EMAIL PROTECTED] Sent: Friday, February 10,

>> 2006 8:50 PM
>> To: Tomcat Users List
>> Subject: Re: Access log to see where robots go.
>>
>> The problem is your home page, not robots.txt. When / is requested -
the
>>
>> following is served back, notice the javascript redirect: (the full
file
>> is below)
>>
>> ----
>>    function invokeWebApp() {
>>      top.location.href =
>> "http://www.theuniquepear.com/unique/index.jsp";;
>>    }
>> ----
>> Search engines do not execute javascript are there are no links on
the
>> page so search engines have no where to go. (Except someone else's 
>> site).
>>
>> As much as I detest SEO companies, you might find it helpful to
search
>> for one for some assistance.
>>
>> <html>
>> <head>
>>    <head>
>>      <title>The Unique Pear | Unique Home Decor & Accessories</title>
>>                  <meta name="description" content="The Unique Pear is
an
>>
>> online b                     outique specializing in home decor & 
>> accessories. Products include clocks, candl                     es,
wall
>>
>> decor, garden, lighting, bath and more.">
>>      <meta name="keywords" content="The Unique Pear Timework clocks,
>> lamps, lamp                      shades, candles, aroma, aroma 
>> difuser, wall
>> decor, wall scounces, wrought iron,                      pitchers, 
>> bookstands,
>> jaqua bath products, candleholders">
>>                  <meta name="description" content="">
>> <meta name="keywords" content="">
>>   </head>
>> <body bgcolor="#FFFFFF">
>>
>> <script language = "javascript">
>>    //<!--
>>    function invokeWebApp() {
>>      top.location.href =
>> "http://www.theuniquepear.com/unique/index.jsp";;
>>    }
>>    invokeWebApp();
>>    // -->
>> </script>
>>
>> hello
>> </body>
>> </html>
>>
>> -Tim
>>
>> Scott Purcell wrote:
>>  
>>> I have had trouble getting search engines to see my site. I built it
>>>     
>> with struts, and use some tags from the index.html page to get
business
>> logic, to finally get to my page. The url is
>> http://www.theuniquepear.com
>>  
>>> Anyway, upon talking to some co-workers, they suggested I watch my
>>>     
>> access log, so I can see what files they are indexing. I thought I
had
>> the access log turned on for the site, and see when someone hits my
web
>> site, but as far as the searchbots go, I only see this in my logs
daily.
>>  
>>> $ cat  localhost_access_log.2006-02-07.txt | less
>>> 67.15.16.30 - - [07/Feb/2006:03:44:55 -0600] "GET /robots.txt
>>>     
>> HTTP/1.0" 404 985
>>  
>>> 67.15.16.30 - - [07/Feb/2006:03:46:21 -0600] "GET / HTTP/1.0" 200
844
>>> 67.15.16.30 - - [07/Feb/2006:03:51:57 -0600] "GET /robots.txt
>>>     
>> HTTP/1.0" 404 985
>>  
>>> 62.114.208.233 - - [07/Feb/2006:03:52:42 -0600] "GET
>>>     
>> /unique/welcome.do?OVRAW=home%20decorating%20ideas&OVKEY=home
>>  
>>> 62.114.208.233 - - [07/Feb/2006:03:52:44 -0600] "GET
>>>     
>> /unique/includes/siteWide.css HTTP/1.1" 200 15402
>>  
>>> 62.114.208.233 - - [07/Feb/2006:03:52:44 -0600] "GET
>>>     
>> /unique/images/header_pear.jpg HTTP/1.1" 200 11227
>>  
>>> I see the entry for robots.txt, but I have no idea where they are
>>>     
>> going, or what they are doing.
>>  
>>> I turned on access log like this in the server.xml like so:
>>>         <Valve className="org.apache.catalina.valves.AccessLogValve"
>>>                  directory="logs"  prefix="localhost_access_log."
>>>     
>> suffix=".txt"
>>  
>>>                  pattern="common" resolveHosts="false"/>
>>>
>>> And that is a snippet of the log from above.
>>>
>>>     
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>> For additional commands, e-mail: [EMAIL PROTECTED]
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>> For additional commands, e-mail: [EMAIL PROTECTED]
>>
>>   
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>

-- 
Mike Sabroff
Web Services Developer
[EMAIL PROTECTED]
920-568-8379


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Update: Robots cannot read JSP?

Reply via email to