So add that to the nutch xml config file? I thought there was a manual way to view and delete urls in 1.4?
On Mon, Oct 31, 2011 at 3:47 PM, Markus Jelsma <[email protected]>wrote: > Hi > > Write an regex URL filter and use it the next time you update the db; it > will > disappear. Be sure to backup the db first in case your regex catches valid > URL's. Nutch 1.5 will have an option to keep the previous version of the DB > after update. > > cheers > > > We accidentally injected some urls into the crawl database and I need to > go > > remove them. From what I understand, in 1.4 I can view and modify the > urls > > and indexes. But I can't seem to find any information on how to do this. > > > > Is there anything regarding this available? >

