Re: Removing URLs from Web DB

2006-02-18 Thread Andrzej Bialecki
Chris Schneider wrote: Nutch Users, Does anyone have a tool or an easy method for removing URLs matching a certain pattern from the MapReduce crawldb? For example, let's say you've been crawling for a while, and then realize that you're spending a lot of time trying to crawl bogus URLs with f

Removing URLs from Web DB

2006-02-17 Thread Chris Schneider
Nutch Users, Does anyone have a tool or an easy method for removing URLs matching a certain pattern from the MapReduce crawldb? For example, let's say you've been crawling for a while, and then realize that you're spending a lot of time trying to crawl bogus URLs with fake domains like http:/