I'm dealing with a lot of file types that I don't want to index.  I was
originally using the regex filter to exclude them but it was getting out of
hand.

I changed my plugin includes from

urlfilter-regex

to

urlfilter-(regex|suffix)

I've tried using both the default urlfilter-suffix.txt file via adding the
extensions I don't want and making my own file that starts with + and
includes the extensions I do want.

Neither of these approaches seem to work.  I continue to get urls added to
the database which continue extensions I don't want.  Even adding a
urlfilter.order section to my nutch-site.xml doesn't work.

I don't see any obvious bugs in the code, so I'm a bit stumped.  Any
suggestions for what else to look at?

Thanks.

Reply via email to