Yes, partition your seed list. Consider 3 seeds. If I inject all of
them and then crawl for a tree depth of 2 then this is what (could)
happen:

0: 3 seeds fetched
1: urls found in 0
2: urls found in 1

To do breadth-first, inject only 1 of the 3 seeds and crawl:

0: 1 seed fetched
1: urls found in 0
2: urls found in 1
(assuming your topN was large enough to exhaust pass all the tree)

inject seed 2 (of the 3) and so on

On Wed, Jun 22, 2011 at 1:43 PM, Nutch User - 1 <[email protected]> wrote:
> As far as I have understood Nutch can be used to do breadth-first
> crawling, at least when topN is large enough (<=> every new page gets
> selected in the list of fetch candidates?). What about depth-first? Is
> there any way to make Nutch perform it?
>



-- 
Regards,
K. Gabriele

--- unchanged since 20/9/10 ---
P.S. If the subject contains "[LON]" or the addressee acknowledges the
receipt within 48 hours then I don't resend the email.
subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧
time(x) < Now + 48h) ⇒ ¬resend(I, this).

If an email is sent by a sender that is not a trusted contact or the
email does not contain a valid code then the email is not received. A
valid code starts with a hyphen and ends with "X".
∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y
∈ L(-[a-z]+[0-9]X)).

Reply via email to