Impossible to say but perhaps there are more non-200 fetched records. Carefully look at the fetcher logs and inspect the crawldb with the readdb -stats command. -----Original message----- > From:Joe Zhang <smartag...@gmail.com> > Sent: Thu 29-Nov-2012 07:04 > To: user <user@nutch.apache.org> > Subject: size of crawl > > With the same set of parameters (-depth 5 -topN 200), I run two different > crawls: > > Crawl 1: 2 sites > Crawl 2: 4 sites (superset of the 2 in Crawl1) > > However, I end up having much fewer docs in Crawl 2. Can anybody suggest > the reason(s)? > > Thanks. > > Joe. >