Re: Classpath hell and Elasticsearch 2.3.2...

Nick Pentreath Thu, 02 Jun 2016 16:11:20 -0700

Fair enough.

However, if you take a look at the deployment guide (
http://spark.apache.org/docs/latest/submitting-applications.html#bundling-your-applications-dependencies)
you will see that the generally advised approach is to package your app
dependencies into a fat JAR and submit (possibly using the --jars option
too). This also means you specify the Scala and other library versions in
your project pom.xml or sbt file, avoiding having to manually decide which
artefact to include on your classpath  :)


On Thu, 2 Jun 2016 at 16:06 Kevin Burton <bur...@spinn3r.com> wrote:

> Yeah.. thanks Nick. Figured that out since your last email... I deleted
> the 2.10 by accident but then put 2+2 together.
>
> Got it working now.
>
> Still sticking to my story that it's somewhat complicated to setup :)
>
> Kevin
>
> On Thu, Jun 2, 2016 at 3:59 PM, Nick Pentreath <nick.pentre...@gmail.com>
> wrote:
>
>> Which Scala version is Spark built against? I'd guess it's 2.10 since
>> you're using spark-1.6, and you're using the 2.11 jar for es-hadoop.
>>
>>
>> On Thu, 2 Jun 2016 at 15:50 Kevin Burton <bur...@spinn3r.com> wrote:
>>
>>> Thanks.
>>>
>>> I'm trying to run it in a standalone cluster with an existing / large
>>> 100 node ES install.
>>>
>>> I'm using the standard 1.6.1 -2.6 distribution with
>>> elasticsearch-hadoop-2.3.2...
>>>
>>> I *think* I'm only supposed to use the
>>> elasticsearch-spark_2.11-2.3.2.jar with it...
>>>
>>> but now I get the following exception:
>>>
>>>
>>> java.lang.NoSuchMethodError:
>>> scala.Predef$.ArrowAssoc(Ljava/lang/Object;)Ljava/lang/Object;
>>> at org.elasticsearch.spark.rdd.EsSpark$.saveToEs(EsSpark.scala:52)
>>> at
>>> org.elasticsearch.spark.package$SparkRDDFunctions.saveToEs(package.scala:37)
>>> at
>>> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:40)
>>> at
>>> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:45)
>>> at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:47)
>>> at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:49)
>>> at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:51)
>>> at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:53)
>>> at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:55)
>>> at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:57)
>>> at $iwC$$iwC$$iwC$$iwC.<init>(<console>:59)
>>> at $iwC$$iwC$$iwC.<init>(<console>:61)
>>> at $iwC$$iwC.<init>(<console>:63)
>>> at $iwC.<init>(<console>:65)
>>> at <init>(<console>:67)
>>> at .<init>(<console>:71)
>>> at .<clinit>(<console>)
>>> at .<init>(<console>:7)
>>> at .<clinit>(<console>)
>>> at $print(<console>)
>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>> at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>> at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>> at java.lang.reflect.Method.invoke(Method.java:497)
>>> at
>>> org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
>>> at
>>> org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346)
>>> at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
>>> at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
>>> at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
>>> at
>>> org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
>>> at
>>> org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
>>> at
>>> org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:875)
>>> at
>>> org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
>>> at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
>>> at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
>>> at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
>>> at org.apache.spark.repl.SparkILoop.org
>>> $apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
>>> at
>>> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
>>> at
>>> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
>>> at
>>> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
>>> at
>>> scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
>>> at org.apache.spark.repl.SparkILoop.org
>>> $apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
>>> at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
>>> at org.apache.spark.repl.Main$.main(Main.scala:31)
>>> at org.apache.spark.repl.Main.main(Main.scala)
>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>> at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>> at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>> at java.lang.reflect.Method.invoke(Method.java:497)
>>> at
>>> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
>>> at
>>> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
>>> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
>>> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
>>> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>>>
>>>
>>> On Thu, Jun 2, 2016 at 3:45 PM, Nick Pentreath <nick.pentre...@gmail.com
>>> > wrote:
>>>
>>>> Hey there
>>>>
>>>> When I used es-hadoop, I just pulled in the dependency into my pom.xml,
>>>> with spark as a "provided" dependency, and built a fat jar with assembly.
>>>>
>>>> Then with spark-submit use the --jars option to include your assembly
>>>> jar (IIRC I sometimes also needed to use --driver-classpath too, but
>>>> perhaps not with recent Spark versions).
>>>>
>>>>
>>>>
>>>> On Thu, 2 Jun 2016 at 15:34 Kevin Burton <bur...@spinn3r.com> wrote:
>>>>
>>>>> I'm trying to get spark 1.6.1 to work with 2.3.2... needless to say
>>>>> it's not super easy.
>>>>>
>>>>> I wish there was an easier way to get this stuff to work.. Last time I
>>>>> tried to use spark more I was having similar problems with classpath setup
>>>>> and Cassandra.
>>>>>
>>>>> Seems a huge opportunity to make this easier for new developers.  This
>>>>> stuff isn't rocket science but it can (needlessly) waste a ton of time.
>>>>>
>>>>> ... anyway... I'm have since figured out I have to specific *specific*
>>>>> jars from the elasticsearch-hadoop distribution and use those.
>>>>>
>>>>> Right now I'm using :
>>>>>
>>>>>
>>>>> SPARK_CLASSPATH=/usr/share/elasticsearch-hadoop/lib/elasticsearch-hadoop-2.3.2.jar:/usr/share/elasticsearch-hadoop/lib/elasticsearch-spark_2.11-2.3.2.jar:/usr/share/elasticsearch-hadoop/lib/elasticsearch-hadoop-mr-2.3.2.jar:/usr/share/apache-spark/lib/*
>>>>>
>>>>> ... but I"m getting:
>>>>>
>>>>> java.lang.NoClassDefFoundError: Could not initialize class
>>>>> org.elasticsearch.hadoop.util.Version
>>>>> at
>>>>> org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:376)
>>>>> at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:40)
>>>>> at
>>>>> org.elasticsearch.spark.rdd.EsSpark$$anonfun$saveToEs$1.apply(EsSpark.scala:67)
>>>>> at
>>>>> org.elasticsearch.spark.rdd.EsSpark$$anonfun$saveToEs$1.apply(EsSpark.scala:67)
>>>>> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>>>>> at org.apache.spark.scheduler.Task.run(Task.scala:89)
>>>>> at
>>>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>>>>> at
>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>>>> at
>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>>>>
>>>>> ... but I think its caused by this:
>>>>>
>>>>> 16/06/03 00:26:48 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID
>>>>> 0, localhost): java.lang.Error: Multiple ES-Hadoop versions detected in 
>>>>> the
>>>>> classpath; please use only one
>>>>>
>>>>> jar:file:/usr/share/elasticsearch-hadoop/lib/elasticsearch-hadoop-2.3.2.jar
>>>>>
>>>>> jar:file:/usr/share/elasticsearch-hadoop/lib/elasticsearch-spark_2.11-2.3.2.jar
>>>>>
>>>>> jar:file:/usr/share/elasticsearch-hadoop/lib/elasticsearch-hadoop-mr-2.3.2.jar
>>>>>
>>>>> at org.elasticsearch.hadoop.util.Version.<clinit>(Version.java:73)
>>>>> at
>>>>> org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:376)
>>>>> at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:40)
>>>>> at
>>>>> org.elasticsearch.spark.rdd.EsSpark$$anonfun$saveToEs$1.apply(EsSpark.scala:67)
>>>>> at
>>>>> org.elasticsearch.spark.rdd.EsSpark$$anonfun$saveToEs$1.apply(EsSpark.scala:67)
>>>>> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>>>>> at org.apache.spark.scheduler.Task.run(Task.scala:89)
>>>>> at
>>>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>>>>> at
>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>>>> at
>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>>>> at java.lang.Thread.run(Thread.java:745)
>>>>>
>>>>> .. still tracking this down but was wondering if there is someting
>>>>> obvious I'm dong wrong.  I'm going to take out
>>>>> elasticsearch-hadoop-2.3.2.jar and try again.
>>>>>
>>>>> Lots of trial and error here :-/
>>>>>
>>>>> Kevin
>>>>>
>>>>> --
>>>>>
>>>>> We’re hiring if you know of any awesome Java Devops or Linux
>>>>> Operations Engineers!
>>>>>
>>>>> Founder/CEO Spinn3r.com
>>>>> Location: *San Francisco, CA*
>>>>> blog: http://burtonator.wordpress.com
>>>>> … or check out my Google+ profile
>>>>> <https://plus.google.com/102718274791889610666/posts>
>>>>>
>>>>>
>>>
>>>
>>> --
>>>
>>> We’re hiring if you know of any awesome Java Devops or Linux Operations
>>> Engineers!
>>>
>>> Founder/CEO Spinn3r.com
>>> Location: *San Francisco, CA*
>>> blog: http://burtonator.wordpress.com
>>> … or check out my Google+ profile
>>> <https://plus.google.com/102718274791889610666/posts>
>>>
>>>
>
>
> --
>
> We’re hiring if you know of any awesome Java Devops or Linux Operations
> Engineers!
>
> Founder/CEO Spinn3r.com
> Location: *San Francisco, CA*
> blog: http://burtonator.wordpress.com
> … or check out my Google+ profile
> <https://plus.google.com/102718274791889610666/posts>
>
>

Re: Classpath hell and Elasticsearch 2.3.2...

Reply via email to