Ok great!

In case anybody comes across this thread before Nutch 1.9 is released, and 
needs to get this working, the easiest solution is just to specify the 
elastic.cluster property in nutch-site.xml in addition to the port number and 
host, rather than modifying the source.

Cheers

Jake 

On Jun 11, 2014, at 8:37 AM, Jake Dodd <[email protected]> wrote:

> Hi all,
> 
> The following applies to Nutch 1.8 (and at least 1.7 as well, it seems).
> 
> I’ve noticed that Nutch throws an exception when the elastic.cluster property 
> is not set—even when elastic.host and elastic.port are properly configured. 
> In the documentation for the elastic properties, it says that you can either 
> specify elastic.cluster, or specify elastic.port together with elastic.host. 
> 
> However, it seems that 
> org.apache.nutch.indexwriter.elastic.ElasticIndexWriter throws an exception 
> if elastic.cluster is missing, regardless of whether elastic.port and 
> elastic.host have been properly set. The exception is thrown in the 
> ElasticIndexWriter.setConf() method.
> 
> Is this a known bug, and has it been fixed in the trunk? I was able to get 
> the Elasticsearch indexer working properly by setting elastic.host and 
> elastic.port, and commenting out the if-statement beginning on line 254 in 
> ElasticIndexWriter.java.
> 
> For reference, here are the exception, and the relevant properties in my 
> nutch-site.xml.
> 
> 
> ***Exception***
> 
> Indexer: java.lang.RuntimeException: Missing elastic.cluster. Should be set 
> in nutch-site.xml 
> ElasticIndexWriter
>       elastic.cluster : elastic prefix cluster
>       elastic.host : hostname
>       elastic.port : port
>       elastic.index : elastic index command 
>       elastic.max.bulk.docs : elastic bulk index doc counts. (default 250) 
>       elastic.max.bulk.size : elastic bulk index length. (default 2500500 
> ~2.5MB)
> 
>       at 
> org.apache.nutch.indexwriter.elastic.ElasticIndexWriter.setConf(ElasticIndexWriter.java:258)
>       at 
> org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:159)
>       at org.apache.nutch.indexer.IndexWriters.<init>(IndexWriters.java:57)
>       at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:91)
>       at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:176)
>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>       at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:186)
> 
> ***nutch-site.xml***
> 
> <property>
>  <name>elastic.host</name>
>  <value>localhost</value>
>  <description>The hostname to send documents to using TransportClient. Either 
> host
>  and port must be defined or cluster.</description>
> </property>
> 
> <property> 
>  <name>elastic.port</name>
>  <value>9300</value>The port to connect to using TransportClient.<description>
>  </description>
> </property>
> 
> Cheers
> 
> Jake

Reply via email to