This is during fetching? Just create a new FetchItem and add it to the queue:

FetchItem fit = FetchItem.create(new Text("http://url";), new 
CrawlDatum(CrawlDatum.STATUS_LINKED, interval), queueMode);
fetchQueues.addFetchItem(fit);

> Hi,
> 
> Being new to Nutch, I'm unsure how Nutch deals with fetching more
> URL's for a site that is currently being fetched.
> 
> e.g. if we inject -> generate -> fetch, then whilst fetching we want
> to add more URL's for potentially the same sites currently being
> fetched, what's in place to ensure that we continue to adhere to the
> politeness policy. As I understand with the initial fetch, all URL's
> for the same site are partitioned into the same segment, though I'm
> unsure what might maintain this for the second.
> 
> Perhaps it's as simple as only allowing 1 fetch process / cluster at a
> time, and then limit the number of URL's / host to ensure the second
> can start in a timely manner?
> 
> Thanks for your help,
> Dan

Reply via email to