Julien Nioche created NUTCH-2069: ------------------------------------ Summary: Ignore external links based on domain Key: NUTCH-2069 URL: https://issues.apache.org/jira/browse/NUTCH-2069 Project: Nutch Issue Type: Improvement Components: fetcher, parser Affects Versions: 1.10 Reporter: Julien Nioche Fix For: 1.11
We currently have `db.ignore.external.links` which is a nice way of restricting the crawl based on the hostname. This adds a new parameter 'db.ignore.external.links.domain' to do the same based on the domain. -- This message was sent by Atlassian JIRA (v6.3.4#6332)