Re: Kafka stream fails: java.lang.NoClassDefFound com/yammer/metrics/core/Gauge

2015-05-14 Thread Tathagata Das
It would be good if you can tell what I should add to the documentation to make it easier to understand. I can update the docs for 1.4.0 release. On Tue, May 12, 2015 at 9:52 AM, Lee McFadden wrote: > Thanks for explaining Sean and Cody, this makes sense now. I'd like to > help improve this doc

Re: Kafka stream fails: java.lang.NoClassDefFound com/yammer/metrics/core/Gauge

2015-05-12 Thread Lee McFadden
Thanks for explaining Sean and Cody, this makes sense now. I'd like to help improve this documentation so other python users don't run into the same thing, so I'll look into that today. On Tue, May 12, 2015 at 9:44 AM Cody Koeninger wrote: > One of the packages just contains the streaming-kafka

Re: Kafka stream fails: java.lang.NoClassDefFound com/yammer/metrics/core/Gauge

2015-05-12 Thread Cody Koeninger
One of the packages just contains the streaming-kafka code. The other contains that code, plus everything it depends on. That's what "assembly" typically means in JVM land. Java/Scala users are accustomed to using their own build tool to include necessary dependencies. JVM dependency management

Re: Kafka stream fails: java.lang.NoClassDefFound com/yammer/metrics/core/Gauge

2015-05-12 Thread Sean Owen
I think Java-land users will understand to look for an assembly jar in general, but it's not as obvious outside the Java ecosystem. Assembly = this thing, plus all its transitive dependencies. No, there is nothing wrong with Kafka at all. You need to bring everything it needs for it to work at run

Re: Kafka stream fails: java.lang.NoClassDefFound com/yammer/metrics/core/Gauge

2015-05-12 Thread Lee McFadden
Thanks again for all the help folks. I can confirm that simply switching to `--packages org.apache.spark:spark-streaming-kafka-assembly_2.10:1.3.1` makes everything work as intended. I'm not sure what the difference is between the two packages honestly, or why one should be used over the other, b

Re: Kafka stream fails: java.lang.NoClassDefFound com/yammer/metrics/core/Gauge

2015-05-12 Thread Ted Yu
bq. it is already in the assembly Yes. Verified: $ jar tvf ~/Downloads/spark-streaming-kafka-assembly_2.10-1.3.1.jar | grep yammer | grep Gauge 1329 Sat Apr 11 04:25:50 PDT 2015 com/yammer/metrics/core/Gauge.class On Tue, May 12, 2015 at 8:05 AM, Sean Owen wrote: > It doesn't depend directl

Re: Kafka stream fails: java.lang.NoClassDefFound com/yammer/metrics/core/Gauge

2015-05-12 Thread Sean Owen
It doesn't depend directly on yammer metrics; Kafka does. It wouldn't be correct to declare that it does; it is already in the assembly anyway. On Tue, May 12, 2015 at 3:50 PM, Ted Yu wrote: > Currently external/kafka/pom.xml doesn't cite yammer metrics as dependency. > > $ ls -l > ~/.m2/reposito

Re: Kafka stream fails: java.lang.NoClassDefFound com/yammer/metrics/core/Gauge

2015-05-12 Thread Ted Yu
Currently external/kafka/pom.xml doesn't cite yammer metrics as dependency. $ ls -l ~/.m2/repository/com/yammer/metrics/metrics-core/2.2.0/metrics-core-2.2.0.jar -rw-r--r-- 1 tyu staff 82123 Dec 17 2013 /Users/tyu/.m2/repository/com/yammer/metrics/metrics-core/2.2.0/metrics-core-2.2.0.jar Inc

Re: Kafka stream fails: java.lang.NoClassDefFound com/yammer/metrics/core/Gauge

2015-05-12 Thread Sean Owen
Yeah, fair point about Python. spark-streaming-kafka should not contain third-party dependencies. However there's nothing stopping the build from producing an assembly jar from these modules. I think there is an assembly target already though? On Tue, May 12, 2015 at 3:37 PM, Lee McFadden wrote:

Re: Kafka stream fails: java.lang.NoClassDefFound com/yammer/metrics/core/Gauge

2015-05-12 Thread Cody Koeninger
I'm not a python user, but isn't that part of what the spark-streaming-kafka-assembly subproject is for? I.e. use spark-streaming-kafka-assembly_2.10:1.3.1 instead of spark-streaming-kafka_2.10:1.3.1 On Tue, May 12, 2015 at 9:37 AM, Lee McFadden wrote: > Sorry to flog this dead horse, but this

Re: Kafka stream fails: java.lang.NoClassDefFound com/yammer/metrics/core/Gauge

2015-05-12 Thread Lee McFadden
Sorry to flog this dead horse, but this is something every python user is going to run into as we *cannot* build the dependencies onto our app. There is no way to do that with a python script. As I see it, this is not a third party integration. The package missing its dependencies is built by the

Re: Kafka stream fails: java.lang.NoClassDefFound com/yammer/metrics/core/Gauge

2015-05-12 Thread Sean Owen
The question is really whether all the third-party integrations should be built into Spark's main assembly. I think reasonable people could disagree, but I think the current state (not built in) is reasonable. It means you have to bring the integration with you. That is, no, third-party queue inte

Re: Kafka stream fails: java.lang.NoClassDefFound com/yammer/metrics/core/Gauge

2015-05-11 Thread Lee McFadden
I opened a ticket on this (without posting here first - bad etiquette, apologies) which was closed as 'fixed'. https://issues.apache.org/jira/browse/SPARK-7538 I don't believe that because I have my script running means this is fixed, I think it is still an issue. I downloaded the spark source,

Re: Kafka stream fails: java.lang.NoClassDefFound com/yammer/metrics/core/Gauge

2015-05-11 Thread Lee McFadden
Ted, many thanks. I'm not used to Java dependencies so this was a real head-scratcher for me. Downloading the two metrics packages from the maven repository (metrics-core, metrics-annotation) and supplying it on the spark-submit command line worked. My final spark-submit for a python project usi

Re: Kafka stream fails: java.lang.NoClassDefFound com/yammer/metrics/core/Gauge

2015-05-11 Thread Sean Owen
Ah yes, the Kafka + streaming code isn't in the assembly, is it? you'd have to provide it and all its dependencies with your app. You could also build this into your own app jar. Tools like Maven will add in the transitive dependencies. On Mon, May 11, 2015 at 10:04 PM, Lee McFadden wrote: > Than

Re: Kafka stream fails: java.lang.NoClassDefFound com/yammer/metrics/core/Gauge

2015-05-11 Thread Ted Yu
You can use '--jars ' option of spark-submit to ship metrics-core jar. Cheers On Mon, May 11, 2015 at 2:04 PM, Lee McFadden wrote: > Thanks Ted, > > The issue is that I'm using packages (see spark-submit definition) and I > do not know how to add com.yammer.metrics:metrics-core to my classpath

Re: Kafka stream fails: java.lang.NoClassDefFound com/yammer/metrics/core/Gauge

2015-05-11 Thread Lee McFadden
Thanks Ted, The issue is that I'm using packages (see spark-submit definition) and I do not know how to add com.yammer.metrics:metrics-core to my classpath so Spark can see it. Should metrics-core not be part of the org.apache.spark:spark-streaming-kafka_2.10:1.3.1 package so it can work correctl

Re: Kafka stream fails: java.lang.NoClassDefFound com/yammer/metrics/core/Gauge

2015-05-11 Thread Ted Yu
com.yammer.metrics.core.Gauge is in metrics-core jar e.g., in master branch: [INFO] | \- org.apache.kafka:kafka_2.10:jar:0.8.1.1:compile [INFO] | +- com.yammer.metrics:metrics-core:jar:2.2.0:compile Please make sure metrics-core jar is on the classpath. On Mon, May 11, 2015 at 1:32 PM, Lee M

Kafka stream fails: java.lang.NoClassDefFound com/yammer/metrics/core/Gauge

2015-05-11 Thread Lee McFadden
Hi, We've been having some issues getting spark streaming running correctly using a Kafka stream, and we've been going around in circles trying to resolve this dependency. Details of our environment and the error below, if anyone can help resolve this it would be much appreciated. Submit command