Chris Schneider wrote:
Nutch Users,
Does anyone have a tool or an easy method for removing URLs matching a
certain pattern from the MapReduce crawldb? For example, let's say
you've been crawling for a while, and then realize that you're
spending a lot of time trying to crawl bogus URLs with f
Nutch Users,
Does anyone have a tool or an easy method for removing URLs matching
a certain pattern from the MapReduce crawldb? For example, let's say
you've been crawling for a while, and then realize that you're
spending a lot of time trying to crawl bogus URLs with fake domains
like http:/