Did you manage to make it work? I'm facing similar problems and this a serious blocker issue. spark-submit seems kind of broken to me if you can use it for spark-streaming.
Regards, Luis 2014-06-11 1:48 GMT+01:00 lannyripple <lanny.rip...@gmail.com>: > I am using Spark 1.0.0 compiled with Hadoop 1.2.1. > > I have a toy spark-streaming-kafka program. It reads from a kafka queue > and > does > > stream > .map {case (k, v) => (v, 1)} > .reduceByKey(_ + _) > .print() > > using a 1 second interval on the stream. > > The docs say to make Spark and Hadoop jars 'provided' but this breaks for > spark-streaming. Including spark-streaming (and spark-streaming-kafka) as > 'compile' to sweep them into our assembly gives collisions on javax.* > classes. To work around this I modified > $SPARK_HOME/bin/compute-classpath.sh to include spark-streaming, > spark-streaming-kafka, and zkclient. (Note that kafka is included as > 'compile' in my project and picked up in the assembly.) > > I have set up conf/spark-env.sh as needed. I have copied my assembly to > /tmp/myjar.jar on all spark hosts and to my hdfs /tmp/jars directory. I am > running spark-submit from my spark master. I am guided by the information > here https://spark.apache.org/docs/latest/submitting-applications.html > > Well at this point I was going to detail all the ways spark-submit fails to > follow it's own documentation. If I do not invoke sparkContext.setJars() > then it just fails to find the driver class. This is using various > combinations of absolute path, file:, hdfs: (Warning: Skip remote jar)??, > and local: prefixes on the application-jar and --jars arguments. > > If I invoke sparkContext.setJars() and include my assembly jar I get > further. At this point I get a failure from > kafka.consumer.ConsumerConnector not being found. I suspect this is > because > spark-streaming-kafka needs the Kafka dependency it but my assembly jar is > too late in the classpath. > > At this point I try setting spark.files.userClassPathfirst to 'true' but > this causes more things to blow up. > > I finally found something that works. Namely setting environment variable > SPARK_CLASSPATH=/tmp/myjar.jar But silly me, this is deprecated and I'm > helpfully informed to > > Please instead use: > - ./spark-submit with --driver-class-path to augment the driver > classpath > - spark.executor.extraClassPath to augment the executor classpath > > which when put into a file and introduced with --properties-file does not > work. (Also tried spark.files.userClassPathFirst here.) These fail with > the kafka.consumer.ConsumerConnector error. > > At a guess what's going on is that using SPARK_CLASSPATH I have my assembly > jar in the classpath at SparkSubmit invocation > > Spark Command: java -cp > > /tmp/myjar.jar::/opt/spark/conf:/opt/spark/lib/spark-assembly-1.0.0-hadoop1.2.1.jar:/opt/spark/lib/spark-streaming_2.10-1.0.0.jar:/opt/spark/lib/spark-streaming-kafka_2.10-1.0.0.jar:/opt/spark/lib/zkclient-0.4.jar > -XX:MaxPermSize=128m -Djava.library.path= -Xms512m -Xmx512m > org.apache.spark.deploy.SparkSubmit --class me.KafkaStreamingWC > /tmp/myjar.jar > > but using --properties-file then the assembly is not available for > SparkSubmit. > > I think the root cause is either spark-submit not handling the > spark-streaming libraries so they can be 'provided' or the inclusion of > org.elicpse.jetty.orbit in the streaming libraries which cause > > [error] (*:assembly) deduplicate: different file contents found in the > following: > [error] > > /Users/lanny/.ivy2/cache/org.eclipse.jetty.orbit/javax.transaction/orbits/javax.transaction-1.1.1.v201105210645.jar:META-INF/ECLIPSEF.RSA > [error] > > /Users/lanny/.ivy2/cache/org.eclipse.jetty.orbit/javax.servlet/orbits/javax.servlet-3.0.0.v201112011016.jar:META-INF/ECLIPSEF.RSA > [error] > > /Users/lanny/.ivy2/cache/org.eclipse.jetty.orbit/javax.mail.glassfish/orbits/javax.mail.glassfish-1.4.1.v201005082020.jar:META-INF/ECLIPSEF.RSA > [error] > > /Users/lanny/.ivy2/cache/org.eclipse.jetty.orbit/javax.activation/orbits/javax.activation-1.1.0.v201105071233.jar:META-INF/ECLIPSEF.RSA > > I've tried applying mergeStategy in assembly for my assembly.sbt but then I > get > > Invalid signature file digest for Manifest main attributes > > If anyone knows the magic to get this working a reply would be greatly > appreciated. > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/spark-streaming-kafka-SPARK-CLASSPATH-tp7356.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. >