What is type of filtering is going on in your configuration? It might be best to readdb incrementally on smaller test fetches to make sure your fetching everything you want to.
On Fri, Sep 30, 2011 at 2:23 PM, Fred Zimmerman <[email protected]> wrote: > What does this mean? Why is db_unfetched so high? > > I want to know how I can be confident that the crawler has fetched all the > pages in the target site. > > CrawlDb statistics start: crawl-20110930124111/crawldb > Statistics for CrawlDb: crawl-20110930124111/crawldb > TOTAL urls: 1237 > retry 0: 1236 > retry 1: 1 > min score: 0.0 > avg score: 0.005751819 > max score: 1.0 > status 1 (db_unfetched): 1040 > status 2 (db_fetched): 179 > status 3 (db_gone): 15 > status 5 (db_redir_perm): 3 > CrawlDb statistics: done > -- *Lewis*

