"bad" URLs are already and still in. You'll need to update your db with the
'updatedb' command

On Monday, June 11, 2012, Bai Shen wrote:

> > >
> > > However, I'm still seeing youtube urls in the fetch logs.  I'm using
> the
> > > -noFilter and -noNorm options with generate.  I'm also not using the
> > > -filter and -normalize options for updatedb.
> >
> > You must either filter out all YT records from the CrawlDB or filter
> > during generating.
> >
> >
> I just tried this and it didn't work.
>
> In my nutch-site.xml I have urlfilter-regex in the plugin.includes.
> In my regex-urlfilter.txt I have -^http://www\.youtube\.com/ right above
> the +. at the bottom.
>
> Yet when I run a crawldb dump, the youtube urls still show up.  What am I
> missing?
>
> Thanks.
>

Reply via email to