Re: Web spiders - disabling jsessionid

2006-12-04 Thread Christopher Schultz
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Bryce, brycenesbitt wrote: > > Caldarale, Charles R wrote: >> Try turning off cookies in your browser. >> > > Sorry for the lack of clarity. I can't force jessionid to show up even with > cookies off in the browser. My guess is that there is a pag

Re: Web spiders - disabling jsessionid

2006-12-03 Thread Rashmi Rubdi
- Original Message From: brycenesbitt [EMAIL PROTECTED] >>A quick google search will show this happens to many other people -- even if >>your webapps are magically immune. http://www.citycarshare.org/ is >>definitely affected. It's not magically immune. It's just built differently fro

Re: Web spiders - disabling jsessionid

2006-12-03 Thread brycenesbitt
Rashmi Rubdi wrote: > > I don't know because this problem doesn't happen in my case, on 2 > different web applications. > > Bryce should really test his case by setting cookies="true" or remove the > cookies attribute and test his links with Xenu to see if he still gets > jsessionid with Xenu.

Re: Web spiders - disabling jsessionid

2006-12-03 Thread Rashmi Rubdi
One thing about search engine bots though is that repairs to jsessionid (removing jsession id) from URLs won't be instantaneous, because they cache all URLs, and on subsequent visits they visit each cached URL. This means that even if you solve the problem of jsessionid now, you will still see

Re: Web spiders - disabling jsessionid

2006-12-03 Thread Rashmi Rubdi
--- Original Message From: Len Popp <[EMAIL PROTECTED]> To: Tomcat Users List Sent: Sunday, December 3, 2006 8:10:00 PM Subject: Re: Web spiders - disabling jsessionid On 12/3/06, Rashmi Rubdi <[EMAIL PROTECTED]> wrote: > No , I'm using Tomcat 5.5. And I've omitted the

Re: Web spiders - disabling jsessionid

2006-12-03 Thread Len Popp
On 12/3/06, Rashmi Rubdi <[EMAIL PROTECTED]> wrote: No , I'm using Tomcat 5.5. And I've omitted the cookies attribute of Context in my Tomcat settings. And Googlebot or any other bot is accessing the URLs just fine (that is without the jsessionid ). When I look in the server access logs, jses

Re: Web spiders - disabling jsessionid

2006-12-03 Thread Rashmi Rubdi
- Original Message >From: brycenesbitt [EMAIL PROTECTED] >>Rashmi Rubdi wrote: >> >>So the solution for Bryce would be to leave the session on on each JSP >> page, and omit the cookies attribute of > true. >>This should solve the problem of jsessionid for bots. >> From my observation se

Re: Web spiders - disabling jsessionid

2006-12-03 Thread brycenesbitt
Rashmi Rubdi wrote: > > So the solution for Bryce would be to leave the session on on each JSP > page, and omit the cookies attribute of true. > This should solve the problem of jsessionid for bots. > From my observation search bots support cookies otherwise I would have the > problem of jses

Re: Web spiders - disabling jsessionid

2006-12-03 Thread Rashmi Rubdi
Original Message From: Eric Haszlakiewicz [EMAIL PROTECTED] >> Perhaps that is the /quickest/ solution, but I would argue that the best >> solution is not to create a session if you don't actually need one. >heh. yeah, not creating the session is definitely NOT the quickest way. :) >e

Re: Web spiders - disabling jsessionid

2006-12-03 Thread Eric Haszlakiewicz
On Fri, Dec 01, 2006 at 04:50:02PM -0500, Christopher Schultz wrote: > Mikolaj Rydzewski wrote: > > Caldarale, Charles R wrote: > >> That contradicts what Len said about his site: > >> > >> "On my site (as on many others) you can browse the site without a > >> session, but if you want to log in (to

Re: Web spiders - disabling jsessionid

2006-12-03 Thread Rashmi Rubdi
Or simply leave out the cookies attribute in your Context, this defaults to cookies = "true" anyway. >>No option seems to match the need: >>true -- uses URL-rewriting if the browser does not support cookies. this is >>exactly the problem, as spiders don't use cookies. No. Googlebot and other

RE: Web spiders - disabling jsessionid

2006-12-03 Thread brycenesbitt
Caldarale, Charles R wrote: > >> From: brycenesbitt [mailto:[EMAIL PROTECTED] >> Subject: Re: Web spiders - disabling jsessionid >> Creating semicolon-based URL strings is the default in >> Tomcat/Struts. > > I don't know about Struts, but that's

RE: Web spiders - disabling jsessionid

2006-12-03 Thread brycenesbitt
Caldarale, Charles R wrote: > > Try turning off cookies in your browser. > Sorry for the lack of clarity. I can't force jessionid to show up even with cookies off in the browser. Using wget from the unix command line (no cookies!) I get a jsessionid for images: /images/logo.gif;jsessionid=C0

Re: Web spiders - disabling jsessionid

2006-12-03 Thread brycenesbitt
Rashmi Rubdi wrote: > > As discussed previously in this thread you can turn jsessionid in the URL > off easily by setting the cookies attribute of http://tomcat.apache.org/tomcat-5.0-doc/config/context.html > No option seems to match the need: true -- uses URL-rewriting if the browser does n

RE: Web spiders - disabling jsessionid

2006-12-03 Thread Caldarale, Charles R
> From: brycenesbitt [mailto:[EMAIL PROTECTED] > Subject: Re: Web spiders - disabling jsessionid > > I can't force a ";" based jsessionid to show in Firefox. Try turning off cookies in your browser. - Chuck THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTH

RE: Web spiders - disabling jsessionid

2006-12-03 Thread Caldarale, Charles R
> From: brycenesbitt [mailto:[EMAIL PROTECTED] > Subject: Re: Web spiders - disabling jsessionid > > Creating semicolon-based URL strings is the default in > Tomcat/Struts. I don't know about Struts, but that's not true for Tomcat. Look at the cookies at

Re: Web spiders - disabling jsessionid

2006-12-03 Thread Rashmi Rubdi
- Original Message From: brycenesbitt <[EMAIL PROTECTED]> >>The problem in many cases is the author does not care about sessions at all! >>Creating semicolon-based URL strings is the default in Tomcat/Struts. We >>get session ID's not because we want a session, but because we can't figur

Re: Web spiders - disabling jsessionid

2006-12-02 Thread brycenesbitt
Google's index has 33.4 million pages with a jsessionid: http://www.google.com/search?hl=en&lr=&q=inurl%3Ajsessionid&btnG=Search Many of those are duplicates (no different other than the jsessionid). -Bryce Nesbitt -- View this message in context: http://www.nabble.c

Re: Web spiders - disabling jsessionid

2006-12-02 Thread brycenesbitt
Christopher Schultz-2 wrote: > > Perhaps that is the /quickest/ solution, but I would argue that the best > solution is not to create a session if you don't actually need one. > The problem in many cases is the author does not care about sessions at all! Creating semicolon-based URL strings i

Re: Web spiders - disabling jsessionid

2006-12-02 Thread Bryce Nesbitt
>Hi, >As you may know url rewriting feature is not a nice thing when spiders >come to index your site - >http://gabrito.com/post/javas-seo-blunder-jsessionid. I'm having such trouble with JSESSIONID and search engines Google, Accoona, Alexa and Exalead. My approach was to contact each firm, and a

Re: Web spiders - disabling jsessionid

2006-12-02 Thread brycenesbitt
Mikolaj Rydzewski-2 wrote: > > Hi, > As you may know url rewriting feature is not a nice thing when spiders > come to index your site - > http://gabrito.com/post/javas-seo-blunder-jsessionid. > I'm having trouble with JSESSIONID with search engines Google, Accoona, Alexa and Exalead. My appr

Re: Web spiders - disabling jsessionid

2006-12-01 Thread Rashmi Rubdi
- Original Message From: "Caldarale, Charles R" [EMAIL PROTECTED] >> From: Rashmi Rubdi [mailto:[EMAIL PROTECTED] >> Subject: Re: Web spiders - disabling jsessionid >> >> I think then, setting cookies to "true", or simply leaving >> o

RE: Web spiders - disabling jsessionid

2006-12-01 Thread Caldarale, Charles R
> From: Rashmi Rubdi [mailto:[EMAIL PROTECTED] > Subject: Re: Web spiders - disabling jsessionid > > I think then, setting cookies to "true", or simply leaving > out the cookies attribute should solve the original poster's > problem with disabling JSESSIONID

Re: Web spiders - disabling jsessionid

2006-12-01 Thread Rashmi Rubdi
- Original Message From: "Caldarale, Charles R" [EMAIL PROTECTED] >>From: Rashmi Rubdi [mailto:[EMAIL PROTECTED] >> >> There's no jsessionid appended at the end of URLs that the >> bot requests. >Depends on what the value of the cookies attribute for the is; >if false, or the app ch

RE: Web spiders - disabling jsessionid

2006-12-01 Thread Caldarale, Charles R
> From: Rashmi Rubdi [mailto:[EMAIL PROTECTED] > Subject: Re: Web spiders - disabling jsessionid > > There's no jsessionid appended at the end of URLs that the > bot requests. Depends on what the value of the cookies attribute for the is; if false, or the app chooses to

Re: Web spiders - disabling jsessionid

2006-12-01 Thread Rashmi Rubdi
> Caldarale, Charles R wrote: > Filter with wrapper ServletResponse is IMO the best solution. > You can apply it to almost every application without touching the code. >>Perhaps that is the /quickest/ solution, but I would argue that the best >>solution is not to create a session if you don't actu

Re: Web spiders - disabling jsessionid

2006-12-01 Thread Leon Rosenberg
On 12/1/06, Caldarale, Charles R <[EMAIL PROTECTED]> wrote: > From: Len Popp [mailto:[EMAIL PROTECTED] > Subject: Re: Web spiders - disabling jsessionid > > On my site (as on many others) you can browse the site without a > session, but if you want to log in (to

Re: Web spiders - disabling jsessionid

2006-12-01 Thread Christopher Schultz
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Mikolaj, Mikolaj Rydzewski wrote: > Caldarale, Charles R wrote: >> That contradicts what Len said about his site: >> >> "On my site (as on many others) you can browse the site without a >> session, but if you want to log in (to add content or to use >

Re: Web spiders - disabling jsessionid

2006-12-01 Thread Len Popp
On 12/1/06, Caldarale, Charles R <[EMAIL PROTECTED]> wrote: > From: Chris Adams [mailto:[EMAIL PROTECTED] > Subject: RE: Web spiders - disabling jsessionid > > That's not true. A session id is assigned the moment you hit > the site. That contradicts what Len said ab

Re: Web spiders - disabling jsessionid

2006-12-01 Thread Mikolaj Rydzewski
Caldarale, Charles R wrote: That's not true. A session id is assigned the moment you hit the site. That contradicts what Len said about his site: "On my site (as on many others) you can browse the site without a session, but if you want to log in (to add content or to use personalized se

RE: Web spiders - disabling jsessionid

2006-12-01 Thread Caldarale, Charles R
> From: Chris Adams [mailto:[EMAIL PROTECTED] > Subject: RE: Web spiders - disabling jsessionid > > That's not true. A session id is assigned the moment you hit > the site. That contradicts what Len said about his site: "On my site (as on many others) you can

RE: Web spiders - disabling jsessionid

2006-12-01 Thread Chris Adams
still manage the "anonymous" user's session. - Chris -Original Message- From: Caldarale, Charles R [mailto:[EMAIL PROTECTED] Sent: Friday, December 01, 2006 7:14 PM To: Tomcat Users List Subject: RE: Web spiders - disabling jsessionid > From: Len Popp [mailto:[EMAIL

RE: Web spiders - disabling jsessionid

2006-12-01 Thread Caldarale, Charles R
> From: Len Popp [mailto:[EMAIL PROTECTED] > Subject: Re: Web spiders - disabling jsessionid > > On my site (as on many others) you can browse the site without a > session, but if you want to log in (to add content or to use > personalized settings) you need a session. O.k.

Re: Web spiders - disabling jsessionid

2006-12-01 Thread Len Popp
On 12/1/06, Christopher Schultz <[EMAIL PROTECTED]> wrote: Mikolaj, Back to the original question... Mikolaj Rydzewski wrote: > As you may know url rewriting feature is not a nice thing when spiders > come to index your site - > http://gabrito.com/post/javas-seo-blunder-jsessionid. So, the pro

Re: Web spiders - disabling jsessionid

2006-12-01 Thread Christopher Schultz
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Mikolaj, Back to the original question... Mikolaj Rydzewski wrote: > As you may know url rewriting feature is not a nice thing when spiders > come to index your site - > http://gabrito.com/post/javas-seo-blunder-jsessionid. So, the problem is that y

RE: Web spiders - disabling jsessionid

2006-12-01 Thread Chris Adams
mall number (e.g. 1 or 2) of those hits would come from this "google-incognito" agent. - Chris -Original Message- From: Christopher Schultz [mailto:[EMAIL PROTECTED] Sent: Friday, December 01, 2006 3:13 PM To: Tomcat Users List Subject: Re: Web spiders - disabling jsessionid ---

Re: Web spiders - disabling jsessionid

2006-12-01 Thread Leon Rosenberg
Hello Christopher, When I check my access logs I could imagine Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1) beeing a google.bot. Of course I don't know it for sure, cause I'm don't do any seo cloaking here, and don't care. But one could go to seo boards, pick the posted ip-adresses for cloa

Re: Web spiders - disabling jsessionid

2006-12-01 Thread Christopher Schultz
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Leon, Leon Rosenberg wrote: > you believe everything you've been told ?:-) Well, I've been told by you, and I don't believe you. ;) > google has 3 (at least 3 known) user agents : google, mozzila with > google-bot in the agent string (the one you se

Re: Web spiders - disabling jsessionid

2006-12-01 Thread Mikolaj Rydzewski
Leon Rosenberg wrote: google uses this 3rd agent to check your site from another ip adress, whether you do some ugly seo stuff, like cloacking etc. Seems possible. so please don't do it, if you rely on being found. I think that just removing ;jsessionid=XXX for the first one won't make much ha

Re: Web spiders - disabling jsessionid

2006-12-01 Thread Leon Rosenberg
you believe everything you've been told ?:-) google has 3 (at least 3 known) user agents : google, mozzila with google-bot in the agent string (the one you sent) and another one, which is just Mozilla/5.0. google uses this 3rd agent to check your site from another ip adress, whether you do some

Re: Web spiders - disabling jsessionid

2006-12-01 Thread Tim Funk
Wrong. Google is very clear about not hiding user agent - as well as a the other major bots. Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html Just just for Googlebot in the user-agent header. -Tim Leon Rosenberg wrote: On 12/1/06, Tim Funk <[EMAIL PROTECTED]> wrote: T

Re: Web spiders - disabling jsessionid

2006-12-01 Thread Leon Rosenberg
On 12/1/06, Tim Funk <[EMAIL PROTECTED]> wrote: The easiest is the filter and custom HttpServletResponse which overrides encodeURL() to do nothing. It could be made one step smarter by checking if the User agent is a search engine bot to selectively execute or not. How do you want to achieve

Re: Web spiders - disabling jsessionid

2006-12-01 Thread Tim Funk
The easiest is the filter and custom HttpServletResponse which overrides encodeURL() to do nothing. It could be made one step smarter by checking if the User agent is a search engine bot to selectively execute or not. -Tim Mikolaj Rydzewski wrote: Hi, As you may know url rewriting feature

Re: Web spiders - disabling jsessionid

2006-12-01 Thread Andrew Stepanenko
Hello, we use filter in web.xml as you said: StripSessionIdFilter. Works fine. Not that much overhead. Regards Andrew Stepanenko, Ternopil, Ukraine http://unf.tane.edu.ua On 12/1/06, Mikolaj Rydzewski <[EMAIL PROTECTED]> wrote: Hi, As you may know url rewriting feature is not a nice thing whe