Hi.

I'm developing a Java software that uses Nutch (2.2.1)+Hbase(0.94.16) 
integration. I'm getting  a NullPointerException in 
org.apache.nutch.urlfilter.api.RegexURLFilterBase.readRules(RegexURLFilterBase.java:179).
 I assume this error is related with following warnings in the log:

....
Jan 30, 2014 12:47:10 AM org.apache.hadoop.conf.Configuration 
getConfResourceAsReader
INFO: regex-normalize.xml not found
Jan 30, 2014 12:47:10 AM 
org.apache.nutch.net.urlnormalizer.regex.RegexURLNormalizer setConf
WARNING: Can't load the default rules!
Jan 30, 2014 12:47:10 AM org.apache.hadoop.conf.Configuration 
getConfResourceAsReader
INFO: regex-urlfilter.txt not found
Jan 30, 2014 12:47:10 AM org.apache.hadoop.mapred.FileOutputCommitter cleanupJob
WARNING: Output path is null in cleanup
....

Both files are included in $NUTCH_HOME/conf folder. And both files are 
correctly configured in the  nutch-default.xml

...
<property>
  <name>urlnormalizer.regex.file</name>
  <value>regex-normalize.xml</value>
  <description>Name of the config file used by the RegexUrlNormalizer class.
  </description>
</property>
...

<property>
  <name>urlfilter.regex.file</name>
  <value>regex-urlfilter.txt</value>
  <description>Name of file on CLASSPATH containing regular expressions
  used by urlfilter-regex (RegexURLFilter) plugin.</description>
</property>

I don't understand why the Nutch don't find those files, everything seems in 
the correct place. Could you help me with this error? Thanks in advance.

Kind Regards,

Mauricio Ciprián Rodríguez





Reply via email to