Hi Jake

This has been fixed in trunk. see
https://github.com/apache/nutch/commit/026b2ff414bcf166de4bfeabef57f0202375ea38#diff-68fe6210481889b1947da1fe7d7ed0afL254
 and https://issues.apache.org/jira/browse/NUTCH-1745

Thanks

Julien


On 11 June 2014 16:37, Jake Dodd <[email protected]> wrote:

> Hi all,
>
> The following applies to Nutch 1.8 (and at least 1.7 as well, it seems).
>
> I’ve noticed that Nutch throws an exception when the elastic.cluster
> property is not set—even when elastic.host and elastic.port are properly
> configured. In the documentation for the elastic properties, it says that
> you can either specify elastic.cluster, or specify elastic.port together
> with elastic.host.
>
> However, it seems that
> org.apache.nutch.indexwriter.elastic.ElasticIndexWriter throws an exception
> if elastic.cluster is missing, regardless of whether elastic.port and
> elastic.host have been properly set. The exception is thrown in the
> ElasticIndexWriter.setConf() method.
>
> Is this a known bug, and has it been fixed in the trunk? I was able to get
> the Elasticsearch indexer working properly by setting elastic.host and
> elastic.port, and commenting out the if-statement beginning on line 254 in
> ElasticIndexWriter.java.
>
> For reference, here are the exception, and the relevant properties in my
> nutch-site.xml.
>
>
> ***Exception***
>
> Indexer: java.lang.RuntimeException: Missing elastic.cluster. Should be
> set in nutch-site.xml
> ElasticIndexWriter
>         elastic.cluster : elastic prefix cluster
>         elastic.host : hostname
>         elastic.port : port
>         elastic.index : elastic index command
>         elastic.max.bulk.docs : elastic bulk index doc counts. (default
> 250)
>         elastic.max.bulk.size : elastic bulk index length. (default
> 2500500 ~2.5MB)
>
>         at
> org.apache.nutch.indexwriter.elastic.ElasticIndexWriter.setConf(ElasticIndexWriter.java:258)
>         at
> org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:159)
>         at
> org.apache.nutch.indexer.IndexWriters.<init>(IndexWriters.java:57)
>         at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:91)
>         at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:176)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>         at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:186)
>
> ***nutch-site.xml***
>
> <property>
>   <name>elastic.host</name>
>   <value>localhost</value>
>   <description>The hostname to send documents to using TransportClient.
> Either host
>   and port must be defined or cluster.</description>
> </property>
>
> <property>
>   <name>elastic.port</name>
>   <value>9300</value>The port to connect to using
> TransportClient.<description>
>   </description>
> </property>
>
> Cheers
>
> Jake




-- 

Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble

Reply via email to