Hi. I'm developing a Java software that uses Nutch (2.2.1)+Hbase(0.94.16) integration. I'm getting a NullPointerException in org.apache.nutch.urlfilter.api.RegexURLFilterBase.readRules(RegexURLFilterBase.java:179). I assume this error is related with following warnings in the log:
.... Jan 30, 2014 12:47:10 AM org.apache.hadoop.conf.Configuration getConfResourceAsReader INFO: regex-normalize.xml not found Jan 30, 2014 12:47:10 AM org.apache.nutch.net.urlnormalizer.regex.RegexURLNormalizer setConf WARNING: Can't load the default rules! Jan 30, 2014 12:47:10 AM org.apache.hadoop.conf.Configuration getConfResourceAsReader INFO: regex-urlfilter.txt not found Jan 30, 2014 12:47:10 AM org.apache.hadoop.mapred.FileOutputCommitter cleanupJob WARNING: Output path is null in cleanup .... Both files are included in $NUTCH_HOME/conf folder. And both files are correctly configured in the nutch-default.xml ... <property> <name>urlnormalizer.regex.file</name> <value>regex-normalize.xml</value> <description>Name of the config file used by the RegexUrlNormalizer class. </description> </property> ... <property> <name>urlfilter.regex.file</name> <value>regex-urlfilter.txt</value> <description>Name of file on CLASSPATH containing regular expressions used by urlfilter-regex (RegexURLFilter) plugin.</description> </property> I don't understand why the Nutch don't find those files, everything seems in the correct place. Could you help me with this error? Thanks in advance. Kind Regards, Mauricio Ciprián RodrÃguez