Hi all, I would like to use Nutch crawler to only get pages with extension html-htm with keyword: "semantic". How can I configure it? I set the file nutch-site.xml the property "urlfilter.regex.file" and value "regex-urlfilter.txt" and the file "regex-urlfilter.txt" I left it as it is, by changing only the last line, precisely I deleted "+." and in its place I added "+\.(html|htm)" but it seems that does not work!
-- View this message in context: http://lucene.472066.n3.nabble.com/help-me-with-nutch-tp4095914.html Sent from the Nutch - User mailing list archive at Nabble.com.

