Hi Elisabeth,

Did you sort your redirect problem?

On Sun, Sep 18, 2011 at 3:46 PM, Nutch User - 1 <[email protected]>wrote:

> On 15.09.2011 22:25, Elisabeth Adler wrote:
>
>> Hi,
>>
>> I am having issues crawling an intranet site with an (imho) odd redirect
>> mechanism. One part of the intranet website requires authentication which
>> Nutch can bypass sending a special http.agent.name. This works fine.
>>
>> The issue I am facing is that the server sends a redirect (302) after
>> successful authentication to the same URL. Nutch is not following the
>> redirect. My guess is that Nutch omits the site because it has been visited
>> before...
>>
>> Any pointers on how to overcome this and index the site after the redirect
>> happened are very welcome. My configuration is below.
>> Thanks a lot,
>> Elisabeth
>>
>>
>> I am using nutch-1.3 with
>> http.agent.name = my-nutch-1.3
>> generate.max.per.host = -1
>> fetcher.threads.per.host = 5
>> fetcher.threads.fetch = 5
>> fetcher.server.delay = 1
>> http.redirect.max = 10
>> plugin.includes = protocol-http|urlfilter-regex|**
>> parse-html|index-(basic|**anchor)|query-(basic|site|url)**
>> |response-(json|xml)|summary-**basic|scoring-opic|**
>> urlnormalizer-(pass|regex|**basic)
>>
>>
>>
> These could give some explanation:
>
> http://lucene.472066.n3.**nabble.com/URL-redirection-**
> and-zero-scores-td3085311.html<http://lucene.472066.n3.nabble.com/URL-redirection-and-zero-scores-td3085311.html>
> http://lucene.472066.n3.**nabble.com/A-possible-**solution-to-my-URL-**
> redirection-and-zero-scores-**problem-td3162164.html<http://lucene.472066.n3.nabble.com/A-possible-solution-to-my-URL-redirection-and-zero-scores-problem-td3162164.html>
> https://issues.apache.org/**jira/browse/NUTCH-1044<https://issues.apache.org/jira/browse/NUTCH-1044>
>



-- 
*Lewis*

Reply via email to