Re: Nutch Several Segment Folders Containing Duplicate Key/URLs

2013-12-24 Thread Tejas Patil
You ran 3 rounds of nutch crawl ("-depth 3") and those 3 folders are 3 segments created for each round of crawl. About the 520 URLs, I don't see any obvious reason for that happening. You should see few of the new urls that were added, what were their parent url and then run a small crawl using tho

Nutch Several Segment Folders Containing Duplicate Key/URLs

2013-12-24 Thread Bin Wang
Hi, I have a very specific list of URLs to crawl and I implemented it by turning off this property: db.update.additions.allowed false If true, updatedb will add newly discovered URLs, if false only already existing URLs in the CrawlDb will be updated and no new URLs will be added.