-----Original message----- > From:Sourajit Basak <sourajit.ba...@gmail.com> > Sent: Tue 11-Jun-2013 14:50 > To: user@nutch.apache.org > Subject: RSS based crawl - how to crawl ref links in next round > > We are crawling RSS links using a custom plugin. Thats working fine. > > Our intention is to crawl the discovered urls in the subsequent round. > However, we notice that the links discovered have a status fetch_success & > also has a signature.
This should not be true for NEW discovered outlinks. Parser plugins return a List<Outlink> and do not carry signature or fetch status information at all. Are you sure you haven't already crawled them? > Hence the generate phase in the subsequent round > isn't producing any urls to fetch. > > We are setting a non-null empty string as parseText in the custom plugin. > > Any ideas on how to force the second round ? > > ~ Sourajit >