You ran 3 rounds of nutch crawl ("-depth 3") and those 3 folders are 3
segments created for each round of crawl.
About the 520 URLs, I don't see any obvious reason for that happening. You
should see few of the new urls that were added, what were their parent url
and then run a small crawl using tho
Hi,
I have a very specific list of URLs to crawl and I implemented it by
turning off this property:
db.update.additions.allowed
false
If true, updatedb will add newly discovered URLs, if false
only already existing URLs in the CrawlDb will be updated and no new
URLs will be added.
2 matches
Mail list logo