Impossible to say but perhaps there are more non-200 fetched records. Carefully 
look at the fetcher logs and inspect the crawldb with the readdb -stats 
command. 
 
-----Original message-----
> From:Joe Zhang <smartag...@gmail.com>
> Sent: Thu 29-Nov-2012 07:04
> To: user <user@nutch.apache.org>
> Subject: size of crawl
> 
>  With the same set of parameters (-depth 5 -topN 200), I run two different
> crawls:
> 
> Crawl 1: 2 sites
> Crawl 2: 4 sites (superset of the 2 in Crawl1)
> 
> However, I end up having much fewer docs in Crawl 2. Can anybody suggest
> the reason(s)?
> 
> Thanks.
> 
> Joe.
> 

Reply via email to