Unable to crawl a URL unless session cookies are set

Krishnanand, Kartik Tue, 02 Dec 2014 00:53:52 -0800

Hi,

I am crawling an internal site where the URL that I want to crawl. I hope that 
someone can help


When I load this URL in the browser, it does a 301 redirect to another URL that 
sets up cookies that will expire until end of session. When  I load the URL 
again in the browser, I am now able to load the URL.

I don't know how to simulate this in my crawler setting. I am aware of 
"http.redirect.max" configuration in our nutch configuration XMLs.  But if I 
understand this correctly, the crawler will follow the redirect and not come 
back to original URL. Is my understanding correct?

How would I be able to crawl this URL?

Thanks,

Kartik

----------------------------------------------------------------------
This message, and any attachments, is for the intended recipient(s) only, may 
contain information that is privileged, confidential and/or proprietary and 
subject to important terms and conditions available at 
http://www.bankofamerica.com/emaildisclaimer.   If you are not the intended 
recipient, please delete this message.

Unable to crawl a URL unless session cookies are set

Reply via email to