Nutch Re-crawl same file over and over again

2006-12-06 Thread Armel T. Nene
Hi, I have setup Nutch to crawl my local filesystem. I set a topN 20 and Depth 2. But when Nutch re-crawls, it re-crawls the same files over and over again. The directory doesn't contain any other sub-directories, can someone let me what might be the cause. There are more than 20 files in the dire

Re: [Archive-access-discuss] Full List of Metadata Fields

2006-12-06 Thread Michael Stack
Hey Shay. Some friendly advice. Cross-posting a question will make you unpopular fast. Its best to start on the most appropriate seeming list and only move on from there if you are getting no satisfaction. The below question looks best at home over on the archive-access list. Let me have

Full List of Metadata Fields

2006-12-06 Thread Shay Lawless
Hi all, I'm using NutchWax (Version 0.7.0-200611082313) and Wera (Version 0.5.0-200611082313) to Index a collection of ARC files generated by a web crawl using the Heritrix web crawler (Version 1.4.0). When I check the metadata tag on the wera front-end the following list of tags are displayed