Yes, partition your seed list. Consider 3 seeds. If I inject all of them and then crawl for a tree depth of 2 then this is what (could) happen:
0: 3 seeds fetched 1: urls found in 0 2: urls found in 1 To do breadth-first, inject only 1 of the 3 seeds and crawl: 0: 1 seed fetched 1: urls found in 0 2: urls found in 1 (assuming your topN was large enough to exhaust pass all the tree) inject seed 2 (of the 3) and so on On Wed, Jun 22, 2011 at 1:43 PM, Nutch User - 1 <[email protected]> wrote: > As far as I have understood Nutch can be used to do breadth-first > crawling, at least when topN is large enough (<=> every new page gets > selected in the list of fetch candidates?). What about depth-first? Is > there any way to make Nutch perform it? > -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains "[LON]" or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) < Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with "X". ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).

