Ignore external links from crawled domains

2005-08-05 Thread Christophe Noel
Hello, A very basic facility seem to be missing in Nutch. If I have a 2000 urls list in Nutch DB and want to ignore external links, I have to build a regex-filter with thousands of different domain I want to crawl. No parameter to only crawl the different domain and ignore external links. At

Re: Ignore external links from crawled domains

2005-08-08 Thread Ken Krugler
A very basic facility seem to be missing in Nutch. If I have a 2000 urls list in Nutch DB and want to ignore external links, I have to build a regex-filter with thousands of different domain I want to crawl. No parameter to only crawl the different domain and ignore external links. At these t