I have an entry in regex-urlfilter.txt designed to prevent crawling of urls that are part of our UPS search app.
# skip URLs from the UPS search app -\?ups= -index.php/ups\?aa When I test the urls, it appears that regex-urlfilter should exclude them, for example: echo "http://redacted.com/index.php/ups?aa" | /usr/local/apache-nutch/bin/nutch org/apache/nutch/net/URLFilterChecker -filterName org.apache.nutch.urlfilter.regex.RegexURLFilter Checking URLFilter org.apache.nutch.urlfilter.regex.RegexURLFilter -http://redacted.com/index.php/ups?aa But when I run 'crawl', it does not skip these urls. Thanks for any help in showing me what I'm missing here.

