I opened a ticket on this (without posting here first - bad etiquette, apologies) which was closed as 'fixed'.
https://issues.apache.org/jira/browse/SPARK-7538 I don't believe that because I have my script running means this is fixed, I think it is still an issue. I downloaded the spark source, ran `mvn -DskipTests clean package `, then simply launched my python script (which shouldn't be introducing additional *java* dependencies itself?). Doesn't this mean these dependencies are missing from the spark build, since I didn't modify any files within the distribution and my application itself can't be introducing java dependency clashes? On Mon, May 11, 2015, 4:34 PM Lee McFadden <splee...@gmail.com> wrote: > Ted, many thanks. I'm not used to Java dependencies so this was a real > head-scratcher for me. > > Downloading the two metrics packages from the maven repository > (metrics-core, metrics-annotation) and supplying it on the spark-submit > command line worked. > > My final spark-submit for a python project using Kafka as an input source: > > /home/ubuntu/spark/spark-1.3.1/bin/spark-submit \ > --packages > TargetHolding/pyspark-cassandra:0.1.4,org.apache.spark:spark-streaming-kafka_2.10:1.3.1 > \ > --jars > /home/ubuntu/jars/metrics-core-2.2.0.jar,/home/ubuntu/jars/metrics-annotation-2.2.0.jar > \ > --conf > spark.cassandra.connection.host=10.10.103.172,10.10.102.160,10.10.101.79 \ > --master spark://127.0.0.1:7077 \ > affected_hosts.py > > Now we're seeing data from the stream. Thanks again! > > On Mon, May 11, 2015 at 2:43 PM Sean Owen <so...@cloudera.com> wrote: > >> Ah yes, the Kafka + streaming code isn't in the assembly, is it? you'd >> have to provide it and all its dependencies with your app. You could >> also build this into your own app jar. Tools like Maven will add in >> the transitive dependencies. >> >> On Mon, May 11, 2015 at 10:04 PM, Lee McFadden <splee...@gmail.com> >> wrote: >> > Thanks Ted, >> > >> > The issue is that I'm using packages (see spark-submit definition) and >> I do >> > not know how to add com.yammer.metrics:metrics-core to my classpath so >> Spark >> > can see it. >> > >> > Should metrics-core not be part of the >> > org.apache.spark:spark-streaming-kafka_2.10:1.3.1 package so it can work >> > correctly? >> > >> > If not, any clues as to how I can add metrics-core to my project >> (bearing in >> > mind that I'm using Python, not a JVM language) would be much >> appreciated. >> > >> > Thanks, and apologies for my newbness with Java/Scala. >> > >> > On Mon, May 11, 2015 at 1:42 PM Ted Yu <yuzhih...@gmail.com> wrote: >> >> >> >> com.yammer.metrics.core.Gauge is in metrics-core jar >> >> e.g., in master branch: >> >> [INFO] | \- org.apache.kafka:kafka_2.10:jar:0.8.1.1:compile >> >> [INFO] | +- com.yammer.metrics:metrics-core:jar:2.2.0:compile >> >> >> >> Please make sure metrics-core jar is on the classpath. >> >> >> >> On Mon, May 11, 2015 at 1:32 PM, Lee McFadden <splee...@gmail.com> >> wrote: >> >>> >> >>> Hi, >> >>> >> >>> We've been having some issues getting spark streaming running >> correctly >> >>> using a Kafka stream, and we've been going around in circles trying to >> >>> resolve this dependency. >> >>> >> >>> Details of our environment and the error below, if anyone can help >> >>> resolve this it would be much appreciated. >> >>> >> >>> Submit command line: >> >>> >> >>> /home/ubuntu/spark/spark-1.3.1/bin/spark-submit \ >> >>> --packages >> >>> >> TargetHolding/pyspark-cassandra:0.1.4,org.apache.spark:spark-streaming-kafka_2.10:1.3.1 >> >>> \ >> >>> --conf >> >>> >> spark.cassandra.connection.host=10.10.103.172,10.10.102.160,10.10.101.79 \ >> >>> --master spark://127.0.0.1:7077 \ >> >>> affected_hosts.py >> >>> >> >>> When we run the streaming job everything starts just fine, then we see >> >>> the following in the logs: >> >>> >> >>> 15/05/11 19:50:46 WARN TaskSetManager: Lost task 0.0 in stage 2.0 (TID >> >>> 70, ip-10-10-102-53.us-west-2.compute.internal): >> >>> java.lang.NoClassDefFoundError: com/yammer/metrics/core/Gauge >> >>> at >> >>> >> kafka.consumer.ZookeeperConsumerConnector.createFetcher(ZookeeperConsumerConnector.scala:151) >> >>> at >> >>> >> kafka.consumer.ZookeeperConsumerConnector.<init>(ZookeeperConsumerConnector.scala:115) >> >>> at >> >>> >> kafka.consumer.ZookeeperConsumerConnector.<init>(ZookeeperConsumerConnector.scala:128) >> >>> at kafka.consumer.Consumer$.create(ConsumerConnector.scala:89) >> >>> at >> >>> >> org.apache.spark.streaming.kafka.KafkaReceiver.onStart(KafkaInputDStream.scala:100) >> >>> at >> >>> >> org.apache.spark.streaming.receiver.ReceiverSupervisor.startReceiver(ReceiverSupervisor.scala:121) >> >>> at >> >>> >> org.apache.spark.streaming.receiver.ReceiverSupervisor.start(ReceiverSupervisor.scala:106) >> >>> at >> >>> >> org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverLauncher$$anonfun$8.apply(ReceiverTracker.scala:298) >> >>> at >> >>> >> org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverLauncher$$anonfun$8.apply(ReceiverTracker.scala:290) >> >>> at >> >>> >> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1498) >> >>> at >> >>> >> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1498) >> >>> at >> >>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) >> >>> at org.apache.spark.scheduler.Task.run(Task.scala:64) >> >>> at >> >>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) >> >>> at >> >>> >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >> >>> at >> >>> >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) >> >>> at java.lang.Thread.run(Thread.java:745) >> >>> Caused by: java.lang.ClassNotFoundException: >> >>> com.yammer.metrics.core.Gauge >> >>> at java.net.URLClassLoader$1.run(URLClassLoader.java:372) >> >>> at java.net.URLClassLoader$1.run(URLClassLoader.java:361) >> >>> at java.security.AccessController.doPrivileged(Native Method) >> >>> at java.net.URLClassLoader.findClass(URLClassLoader.java:360) >> >>> at java.lang.ClassLoader.loadClass(ClassLoader.java:424) >> >>> at java.lang.ClassLoader.loadClass(ClassLoader.java:357) >> >>> ... 17 more >> >>> >> >>> >> >> >> > >> >