Can you give me an example of how would I set my URL filter to do this?
Right now I'm just using the default.

On Mon, Oct 31, 2011 at 3:47 PM, Markus Jelsma
<[email protected]>wrote:

> Hi
>
> Write an regex URL filter and use it the next time you update the db; it
> will
> disappear. Be sure to backup the db first in case your regex catches valid
> URL's. Nutch 1.5 will have an option to keep the previous version of the DB
> after update.
>
> cheers
>
> > We accidentally injected some urls into the crawl database and I need to
> go
> > remove them.  From what I understand, in 1.4 I can view and modify the
> urls
> > and indexes.  But I can't seem to find any information on how to do this.
> >
> > Is there anything regarding this available?
>

Reply via email to