I'm trying to get spark 1.6.1 to work with 2.3.2... needless to say it's not super easy.
I wish there was an easier way to get this stuff to work.. Last time I tried to use spark more I was having similar problems with classpath setup and Cassandra. Seems a huge opportunity to make this easier for new developers. This stuff isn't rocket science but it can (needlessly) waste a ton of time. ... anyway... I'm have since figured out I have to specific *specific* jars from the elasticsearch-hadoop distribution and use those. Right now I'm using : SPARK_CLASSPATH=/usr/share/elasticsearch-hadoop/lib/elasticsearch-hadoop-2.3.2.jar:/usr/share/elasticsearch-hadoop/lib/elasticsearch-spark_2.11-2.3.2.jar:/usr/share/elasticsearch-hadoop/lib/elasticsearch-hadoop-mr-2.3.2.jar:/usr/share/apache-spark/lib/* ... but I"m getting: java.lang.NoClassDefFoundError: Could not initialize class org.elasticsearch.hadoop.util.Version at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:376) at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:40) at org.elasticsearch.spark.rdd.EsSpark$$anonfun$saveToEs$1.apply(EsSpark.scala:67) at org.elasticsearch.spark.rdd.EsSpark$$anonfun$saveToEs$1.apply(EsSpark.scala:67) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ... but I think its caused by this: 16/06/03 00:26:48 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.Error: Multiple ES-Hadoop versions detected in the classpath; please use only one jar:file:/usr/share/elasticsearch-hadoop/lib/elasticsearch-hadoop-2.3.2.jar jar:file:/usr/share/elasticsearch-hadoop/lib/elasticsearch-spark_2.11-2.3.2.jar jar:file:/usr/share/elasticsearch-hadoop/lib/elasticsearch-hadoop-mr-2.3.2.jar at org.elasticsearch.hadoop.util.Version.<clinit>(Version.java:73) at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:376) at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:40) at org.elasticsearch.spark.rdd.EsSpark$$anonfun$saveToEs$1.apply(EsSpark.scala:67) at org.elasticsearch.spark.rdd.EsSpark$$anonfun$saveToEs$1.apply(EsSpark.scala:67) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) .. still tracking this down but was wondering if there is someting obvious I'm dong wrong. I'm going to take out elasticsearch-hadoop-2.3.2.jar and try again. Lots of trial and error here :-/ Kevin -- We’re hiring if you know of any awesome Java Devops or Linux Operations Engineers! Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile <https://plus.google.com/102718274791889610666/posts>