Hi, Sorry to hear about your troubles. Not sure whether you are aware of the ES-Hadoop docs [1]. I've raised an issue [2] to better clarify the usage of elasticsearch-hadoop vs elasticsearch-spark jars.
Apologies for the delayed response, for ES-Hadoop questions/issues it's best to use the dedicated forum namely https://discuss.elastic.co/c/elasticsearch-and-hadoop (see [3]). Hope this helps, [1] https://www.elastic.co/guide/en/elasticsearch/hadoop/2.3/spark.html [2] https://github.com/elastic/elasticsearch-hadoop/issues/780 [3] https://www.elastic.co/guide/en/elasticsearch/hadoop/master/troubleshooting.html#help On 6/3/16 2:06 AM, Kevin Burton wrote:
Yeah.. thanks Nick. Figured that out since your last email... I deletedthe 2.10 by accident but then put 2+2 together. Got it working now. Still sticking to my story that it's somewhat complicated to setup :) Kevin On Thu, Jun 2, 2016 at 3:59 PM, Nick Pentreath <nick.pentre...@gmail.com <mailto:nick.pentre...@gmail.com>> wrote: Which Scala version is Spark built against? I'd guess it's 2.10 since you're using spark-1.6, and you're using the 2.11 jar for es-hadoop. On Thu, 2 Jun 2016 at 15:50 Kevin Burton <bur...@spinn3r.com <mailto:bur...@spinn3r.com>> wrote: Thanks. I'm trying to run it in a standalone cluster with an existing /large 100 node ES install. I'm using the standard 1.6.1 -2.6 distribution with elasticsearch-hadoop-2.3.2... I *think* I'm only supposed to use the elasticsearch-spark_2.11-2.3.2.jar with it... but now I get the following exception: java.lang.NoSuchMethodError: scala.Predef$.ArrowAssoc(Ljava/lang/Object;)Ljava/lang/Object; at org.elasticsearch.spark.rdd.EsSpark$.saveToEs(EsSpark.scala:52) at org.elasticsearch.spark.package$SparkRDDFunctions.saveToEs(package.scala:37) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:40) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:45) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:47) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:49) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:51) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:53) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:55) at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:57) at $iwC$$iwC$$iwC$$iwC.<init>(<console>:59) at $iwC$$iwC$$iwC.<init>(<console>:61) at $iwC$$iwC.<init>(<console>:63) at $iwC.<init>(<console>:65) at <init>(<console>:67) at .<init>(<console>:71) at .<clinit>(<console>) at .<init>(<console>:7) at .<clinit>(<console>) at $print(<console>) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346) at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819) at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857) at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902) at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:875) at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902) at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814) at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657) at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665) at org.apache.spark.repl.SparkILoop.org <http://org.apache.spark.repl.SparkILoop.org>$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) at org.apache.spark.repl.SparkILoop.org <http://org.apache.spark.repl.SparkILoop.org>$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059) at org.apache.spark.repl.Main$.main(Main.scala:31) at org.apache.spark.repl.Main.main(Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) On Thu, Jun 2, 2016 at 3:45 PM, Nick Pentreath <nick.pentre...@gmail.com <mailto:nick.pentre...@gmail.com>> wrote: Hey there When I used es-hadoop, I just pulled in the dependency intomy pom.xml, with spark as a "provided" dependency, and built a fat jar with assembly. Then with spark-submit use the --jars option to include your assembly jar (IIRC I sometimes also needed to use --driver-classpath too, but perhaps not with recent Spark versions). On Thu, 2 Jun 2016 at 15:34 Kevin Burton <bur...@spinn3r.com <mailto:bur...@spinn3r.com>> wrote: I'm trying to get spark 1.6.1 to work with 2.3.2... needless to say it's not super easy. I wish there was an easier way to get this stuff to work.. Last time I tried to use spark more I was having similar problems with classpath setup and Cassandra. Seems a huge opportunity to make this easier for new developers. This stuff isn't rocket science but it can (needlessly) waste a ton of time. ... anyway... I'm have since figured out I have to specific *specific* jars from the elasticsearch-hadoop distribution and use those. Right now I'm using : SPARK_CLASSPATH=/usr/share/elasticsearch-hadoop/lib/elasticsearch-hadoop-2.3.2.jar:/usr/share/elasticsearch-hadoop/lib/elasticsearch-spark_2.11-2.3.2.jar:/usr/share/elasticsearch-hadoop/lib/elasticsearch-hadoop-mr-2.3.2.jar:/usr/share/apache-spark/lib/* ... but I"m getting: java.lang.NoClassDefFoundError: Could not initialize class org.elasticsearch.hadoop.util.Version at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:376) at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:40) at org.elasticsearch.spark.rdd.EsSpark$$anonfun$saveToEs$1.apply(EsSpark.scala:67) at org.elasticsearch.spark.rdd.EsSpark$$anonfun$saveToEs$1.apply(EsSpark.scala:67) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ... but I think its caused by this: 16/06/03 00:26:48 WARN TaskSetManager: Lost task 0.0 instage 0.0 (TID 0, localhost): java.lang.Error: Multiple ES-Hadoop versions detected in the classpath; please use only one jar:file:/usr/share/elasticsearch-hadoop/lib/elasticsearch-hadoop-2.3.2.jar jar:file:/usr/share/elasticsearch-hadoop/lib/elasticsearch-spark_2.11-2.3.2.jar jar:file:/usr/share/elasticsearch-hadoop/lib/elasticsearch-hadoop-mr-2.3.2.jar at org.elasticsearch.hadoop.util.Version.<clinit>(Version.java:73) at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:376) at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:40) at org.elasticsearch.spark.rdd.EsSpark$$anonfun$saveToEs$1.apply(EsSpark.scala:67) at org.elasticsearch.spark.rdd.EsSpark$$anonfun$saveToEs$1.apply(EsSpark.scala:67) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) .. still tracking this down but was wondering if there is someting obvious I'm dong wrong. I'm going to take out elasticsearch-hadoop-2.3.2.jar and try again. Lots of trial and error here :-/ Kevin -- We’re hiring if you know of any awesome Java Devops or Linux Operations Engineers! Founder/CEO Spinn3r.com <http://Spinn3r.com> Location: *San Francisco, CA* blog:**http://burtonator.wordpress.com … or check out my Google+ profile <https://plus.google.com/102718274791889610666/posts> -- We’re hiring if you know of any awesome Java Devops or Linux Operations Engineers! Founder/CEO Spinn3r.com <http://Spinn3r.com> Location: *San Francisco, CA* blog:**http://burtonator.wordpress.com … or check out my Google+ profile <https://plus.google.com/102718274791889610666/posts> -- We’re hiring if you know of any awesome Java Devops or Linux Operations Engineers! Founder/CEO Spinn3r.com <http://Spinn3r.com> Location: *San Francisco, CA* blog:**http://burtonator.wordpress.com … or check out my Google+ profile <https://plus.google.com/102718274791889610666/posts>
-- Costin -- Costin --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org