I don't think this can be done out of the box since there is no state at any time. But you should be able to hack the queue in the fetcher; that is the only point where URL's in a given crawl share a common object in which you can store a state.
If URL's are partitioned by host or domain they are guaranteed to end up in the same queue object. Making things work would certainly need some serious hacking around such as retrieving the cookie of the first session from the HTTP client and attempt to have it reused by following URL's from the same queue. This would be quite a pain to make if you're unfamiliar with the fetcher i guess. > hi > > I am crawling a site x.y.z which sets a cookie, now when nutch crawls > another page from the same site it is not passing this cookie to the server > causing many sessions . > > I m using nutch 1.3. > > Any setting we need to change in order to maintain state in same crawl? > > thanks > abhay > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/maintain-state-between-urls-in-same-cra > wl-session-tp3691839p3691839.html Sent from the Nutch - User mailing list > archive at Nabble.com.

