Hi,
I have setup Nutch to crawl my local filesystem. I set a topN 20 and Depth
2. But when Nutch re-crawls, it re-crawls the same files over and over
again. The directory doesn't contain any other sub-directories, can someone
let me what might be the cause. There are more than 20 files in the
dire
Hey Shay.
Some friendly advice. Cross-posting a question will make you unpopular
fast. Its best to start on the most appropriate seeming list and only
move on from there if you are getting no satisfaction. The below
question looks best at home over on the archive-access list. Let me
have
Hi all,
I'm using NutchWax (Version 0.7.0-200611082313) and Wera (Version
0.5.0-200611082313) to Index a collection of ARC files generated by a web
crawl using the Heritrix web crawler (Version 1.4.0).
When I check the metadata tag on the wera front-end the following list of
tags are displayed