Turns out it was because I had a copy of the default file sitting in the directory I was calling nutch from.
Once I removed that it correctly found my copy in the conf directory. On Wed, Jun 12, 2013 at 9:29 AM, Bai Shen <[email protected]> wrote: > Doh! I really should just read the code of things before posting. > > I ran the URLFilterChecker and passed it in a url that the SuffixFilter > should flag and it still passed it. However, if I change the url to end in > a format that is in the default config file, it rejects the url. > > So it looks like the problem is that it's not loading the altered config > file from my conf directory. Not sure why since the regex filter correctly > finds it's config file. > > > On Wed, Jun 12, 2013 at 8:34 AM, Markus Jelsma <[email protected] > > wrote: > >> We happily use that filter just as it is shipped with Nutch. Just >> enabling it in plugin.includes works for us. To ease testing you can use >> the bin/nutch org.apache.nutch.net.URLFilterChecker to test filters. >> >> >> -----Original message----- >> > From:Bai Shen <[email protected]> >> > Sent: Wed 12-Jun-2013 14:32 >> > To: [email protected] >> > Subject: Suffix URLFilter not working >> > >> > I'm dealing with a lot of file types that I don't want to index. I was >> > originally using the regex filter to exclude them but it was getting >> out of >> > hand. >> > >> > I changed my plugin includes from >> > >> > urlfilter-regex >> > >> > to >> > >> > urlfilter-(regex|suffix) >> > >> > I've tried using both the default urlfilter-suffix.txt file via adding >> the >> > extensions I don't want and making my own file that starts with + and >> > includes the extensions I do want. >> > >> > Neither of these approaches seem to work. I continue to get urls added >> to >> > the database which continue extensions I don't want. Even adding a >> > urlfilter.order section to my nutch-site.xml doesn't work. >> > >> > I don't see any obvious bugs in the code, so I'm a bit stumped. Any >> > suggestions for what else to look at? >> > >> > Thanks. >> > >> > >

