-----Original message-----
> From:Sourajit Basak <sourajit.ba...@gmail.com>
> Sent: Tue 11-Jun-2013 14:50
> To: user@nutch.apache.org
> Subject: RSS based crawl - how to crawl ref links in next round
> 
> We are crawling RSS links using a custom plugin. Thats working fine.
> 
> Our intention is to crawl the discovered urls in the subsequent round.
> However, we notice that the links discovered have a status fetch_success &
> also has a signature.

This should not be true for NEW discovered outlinks. Parser plugins return a 
List<Outlink> and do not carry signature or fetch status information at all. 
Are you sure you haven't already crawled them?

> Hence the generate phase in the subsequent round
> isn't producing any urls to fetch.
> 
> We are setting a non-null empty string as parseText in the custom plugin.
> 
> Any ideas on how to force the second round ?
> 
> ~ Sourajit
> 

Reply via email to