Hi all,

The following applies to Nutch 1.8 (and at least 1.7 as well, it seems).

I’ve noticed that Nutch throws an exception when the elastic.cluster property 
is not set—even when elastic.host and elastic.port are properly configured. In 
the documentation for the elastic properties, it says that you can either 
specify elastic.cluster, or specify elastic.port together with elastic.host. 

However, it seems that org.apache.nutch.indexwriter.elastic.ElasticIndexWriter 
throws an exception if elastic.cluster is missing, regardless of whether 
elastic.port and elastic.host have been properly set. The exception is thrown 
in the ElasticIndexWriter.setConf() method.

Is this a known bug, and has it been fixed in the trunk? I was able to get the 
Elasticsearch indexer working properly by setting elastic.host and 
elastic.port, and commenting out the if-statement beginning on line 254 in 
ElasticIndexWriter.java.

For reference, here are the exception, and the relevant properties in my 
nutch-site.xml.


***Exception***

Indexer: java.lang.RuntimeException: Missing elastic.cluster. Should be set in 
nutch-site.xml 
ElasticIndexWriter
        elastic.cluster : elastic prefix cluster
        elastic.host : hostname
        elastic.port : port
        elastic.index : elastic index command 
        elastic.max.bulk.docs : elastic bulk index doc counts. (default 250) 
        elastic.max.bulk.size : elastic bulk index length. (default 2500500 
~2.5MB)

        at 
org.apache.nutch.indexwriter.elastic.ElasticIndexWriter.setConf(ElasticIndexWriter.java:258)
        at 
org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:159)
        at org.apache.nutch.indexer.IndexWriters.<init>(IndexWriters.java:57)
        at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:91)
        at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:176)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:186)

***nutch-site.xml***

<property>
  <name>elastic.host</name>
  <value>localhost</value>
  <description>The hostname to send documents to using TransportClient. Either 
host
  and port must be defined or cluster.</description>
</property>

<property> 
  <name>elastic.port</name>
  <value>9300</value>The port to connect to using TransportClient.<description>
  </description>
</property>

Cheers

Jake

Reply via email to