Hi Jake This has been fixed in trunk. see https://github.com/apache/nutch/commit/026b2ff414bcf166de4bfeabef57f0202375ea38#diff-68fe6210481889b1947da1fe7d7ed0afL254 and https://issues.apache.org/jira/browse/NUTCH-1745
Thanks Julien On 11 June 2014 16:37, Jake Dodd <[email protected]> wrote: > Hi all, > > The following applies to Nutch 1.8 (and at least 1.7 as well, it seems). > > I’ve noticed that Nutch throws an exception when the elastic.cluster > property is not set—even when elastic.host and elastic.port are properly > configured. In the documentation for the elastic properties, it says that > you can either specify elastic.cluster, or specify elastic.port together > with elastic.host. > > However, it seems that > org.apache.nutch.indexwriter.elastic.ElasticIndexWriter throws an exception > if elastic.cluster is missing, regardless of whether elastic.port and > elastic.host have been properly set. The exception is thrown in the > ElasticIndexWriter.setConf() method. > > Is this a known bug, and has it been fixed in the trunk? I was able to get > the Elasticsearch indexer working properly by setting elastic.host and > elastic.port, and commenting out the if-statement beginning on line 254 in > ElasticIndexWriter.java. > > For reference, here are the exception, and the relevant properties in my > nutch-site.xml. > > > ***Exception*** > > Indexer: java.lang.RuntimeException: Missing elastic.cluster. Should be > set in nutch-site.xml > ElasticIndexWriter > elastic.cluster : elastic prefix cluster > elastic.host : hostname > elastic.port : port > elastic.index : elastic index command > elastic.max.bulk.docs : elastic bulk index doc counts. (default > 250) > elastic.max.bulk.size : elastic bulk index length. (default > 2500500 ~2.5MB) > > at > org.apache.nutch.indexwriter.elastic.ElasticIndexWriter.setConf(ElasticIndexWriter.java:258) > at > org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:159) > at > org.apache.nutch.indexer.IndexWriters.<init>(IndexWriters.java:57) > at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:91) > at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:176) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:186) > > ***nutch-site.xml*** > > <property> > <name>elastic.host</name> > <value>localhost</value> > <description>The hostname to send documents to using TransportClient. > Either host > and port must be defined or cluster.</description> > </property> > > <property> > <name>elastic.port</name> > <value>9300</value>The port to connect to using > TransportClient.<description> > </description> > </property> > > Cheers > > Jake -- Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com http://twitter.com/digitalpebble

