Ok great! In case anybody comes across this thread before Nutch 1.9 is released, and needs to get this working, the easiest solution is just to specify the elastic.cluster property in nutch-site.xml in addition to the port number and host, rather than modifying the source.
Cheers Jake On Jun 11, 2014, at 8:37 AM, Jake Dodd <[email protected]> wrote: > Hi all, > > The following applies to Nutch 1.8 (and at least 1.7 as well, it seems). > > I’ve noticed that Nutch throws an exception when the elastic.cluster property > is not set—even when elastic.host and elastic.port are properly configured. > In the documentation for the elastic properties, it says that you can either > specify elastic.cluster, or specify elastic.port together with elastic.host. > > However, it seems that > org.apache.nutch.indexwriter.elastic.ElasticIndexWriter throws an exception > if elastic.cluster is missing, regardless of whether elastic.port and > elastic.host have been properly set. The exception is thrown in the > ElasticIndexWriter.setConf() method. > > Is this a known bug, and has it been fixed in the trunk? I was able to get > the Elasticsearch indexer working properly by setting elastic.host and > elastic.port, and commenting out the if-statement beginning on line 254 in > ElasticIndexWriter.java. > > For reference, here are the exception, and the relevant properties in my > nutch-site.xml. > > > ***Exception*** > > Indexer: java.lang.RuntimeException: Missing elastic.cluster. Should be set > in nutch-site.xml > ElasticIndexWriter > elastic.cluster : elastic prefix cluster > elastic.host : hostname > elastic.port : port > elastic.index : elastic index command > elastic.max.bulk.docs : elastic bulk index doc counts. (default 250) > elastic.max.bulk.size : elastic bulk index length. (default 2500500 > ~2.5MB) > > at > org.apache.nutch.indexwriter.elastic.ElasticIndexWriter.setConf(ElasticIndexWriter.java:258) > at > org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:159) > at org.apache.nutch.indexer.IndexWriters.<init>(IndexWriters.java:57) > at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:91) > at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:176) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:186) > > ***nutch-site.xml*** > > <property> > <name>elastic.host</name> > <value>localhost</value> > <description>The hostname to send documents to using TransportClient. Either > host > and port must be defined or cluster.</description> > </property> > > <property> > <name>elastic.port</name> > <value>9300</value>The port to connect to using TransportClient.<description> > </description> > </property> > > Cheers > > Jake

