Hi Elisabeth, Did you sort your redirect problem?
On Sun, Sep 18, 2011 at 3:46 PM, Nutch User - 1 <[email protected]>wrote: > On 15.09.2011 22:25, Elisabeth Adler wrote: > >> Hi, >> >> I am having issues crawling an intranet site with an (imho) odd redirect >> mechanism. One part of the intranet website requires authentication which >> Nutch can bypass sending a special http.agent.name. This works fine. >> >> The issue I am facing is that the server sends a redirect (302) after >> successful authentication to the same URL. Nutch is not following the >> redirect. My guess is that Nutch omits the site because it has been visited >> before... >> >> Any pointers on how to overcome this and index the site after the redirect >> happened are very welcome. My configuration is below. >> Thanks a lot, >> Elisabeth >> >> >> I am using nutch-1.3 with >> http.agent.name = my-nutch-1.3 >> generate.max.per.host = -1 >> fetcher.threads.per.host = 5 >> fetcher.threads.fetch = 5 >> fetcher.server.delay = 1 >> http.redirect.max = 10 >> plugin.includes = protocol-http|urlfilter-regex|** >> parse-html|index-(basic|**anchor)|query-(basic|site|url)** >> |response-(json|xml)|summary-**basic|scoring-opic|** >> urlnormalizer-(pass|regex|**basic) >> >> >> > These could give some explanation: > > http://lucene.472066.n3.**nabble.com/URL-redirection-** > and-zero-scores-td3085311.html<http://lucene.472066.n3.nabble.com/URL-redirection-and-zero-scores-td3085311.html> > http://lucene.472066.n3.**nabble.com/A-possible-**solution-to-my-URL-** > redirection-and-zero-scores-**problem-td3162164.html<http://lucene.472066.n3.nabble.com/A-possible-solution-to-my-URL-redirection-and-zero-scores-problem-td3162164.html> > https://issues.apache.org/**jira/browse/NUTCH-1044<https://issues.apache.org/jira/browse/NUTCH-1044> > -- *Lewis*

