Hi,
Thanks for all the links, had a quick look through them already, but was stuck with work, so I get around testing it this afternoon. I let you know how it goes.
Best,
Elisabeth

On 21.09.2011 09:32, lewis john mcgibbney wrote:
Hi Elisabeth,

Did you sort your redirect problem?

On Sun, Sep 18, 2011 at 3:46 PM, Nutch User - 1<[email protected]>wrote:

On 15.09.2011 22:25, Elisabeth Adler wrote:

Hi,

I am having issues crawling an intranet site with an (imho) odd redirect
mechanism. One part of the intranet website requires authentication which
Nutch can bypass sending a special http.agent.name. This works fine.

The issue I am facing is that the server sends a redirect (302) after
successful authentication to the same URL. Nutch is not following the
redirect. My guess is that Nutch omits the site because it has been visited
before...

Any pointers on how to overcome this and index the site after the redirect
happened are very welcome. My configuration is below.
Thanks a lot,
Elisabeth


I am using nutch-1.3 with
http.agent.name = my-nutch-1.3
generate.max.per.host = -1
fetcher.threads.per.host = 5
fetcher.threads.fetch = 5
fetcher.server.delay = 1
http.redirect.max = 10
plugin.includes = protocol-http|urlfilter-regex|**
parse-html|index-(basic|**anchor)|query-(basic|site|url)**
|response-(json|xml)|summary-**basic|scoring-opic|**
urlnormalizer-(pass|regex|**basic)



These could give some explanation:

http://lucene.472066.n3.**nabble.com/URL-redirection-**
and-zero-scores-td3085311.html<http://lucene.472066.n3.nabble.com/URL-redirection-and-zero-scores-td3085311.html>
http://lucene.472066.n3.**nabble.com/A-possible-**solution-to-my-URL-**
redirection-and-zero-scores-**problem-td3162164.html<http://lucene.472066.n3.nabble.com/A-possible-solution-to-my-URL-redirection-and-zero-scores-problem-td3162164.html>
https://issues.apache.org/**jira/browse/NUTCH-1044<https://issues.apache.org/jira/browse/NUTCH-1044>



Reply via email to