Thanks for explaining Sean and Cody, this makes sense now. I'd like to help improve this documentation so other python users don't run into the same thing, so I'll look into that today.
On Tue, May 12, 2015 at 9:44 AM Cody Koeninger <c...@koeninger.org> wrote: > One of the packages just contains the streaming-kafka code. The other > contains that code, plus everything it depends on. That's what "assembly" > typically means in JVM land. > > Java/Scala users are accustomed to using their own build tool to include > necessary dependencies. JVM dependency management is (thankfully) > different from Python dependency management. > > As far as I can tell, there is no core issue, upstream or otherwise. > > > > > > > On Tue, May 12, 2015 at 11:39 AM, Lee McFadden <splee...@gmail.com> wrote: > >> Thanks again for all the help folks. >> >> I can confirm that simply switching to `--packages >> org.apache.spark:spark-streaming-kafka-assembly_2.10:1.3.1` makes >> everything work as intended. >> >> I'm not sure what the difference is between the two packages honestly, or >> why one should be used over the other, but the documentation is currently >> not intuitive in this matter. If you follow the instructions, initially it >> will seem broken. Is there any reason why the docs for Python users (or, >> in fact, all users - Java/Scala users will run into this too except they >> are armed with the ability to build their own jar with the dependencies >> included) should not be changed to using the assembly package by default? >> >> Additionally, after a few google searches yesterday combined with your >> help I'm wondering if the core issue is upstream in Kafka's dependency >> chain? >> >> On Tue, May 12, 2015 at 8:53 AM Ted Yu <yuzhih...@gmail.com> wrote: >> >>> bq. it is already in the assembly >>> >>> Yes. Verified: >>> >>> $ jar tvf ~/Downloads/spark-streaming-kafka-assembly_2.10-1.3.1.jar | grep >>> yammer | grep Gauge >>> 1329 Sat Apr 11 04:25:50 PDT 2015 com/yammer/metrics/core/Gauge.class >>> >>> >>> On Tue, May 12, 2015 at 8:05 AM, Sean Owen <so...@cloudera.com> wrote: >>> >>>> It doesn't depend directly on yammer metrics; Kafka does. It wouldn't >>>> be correct to declare that it does; it is already in the assembly >>>> anyway. >>>> >>>> On Tue, May 12, 2015 at 3:50 PM, Ted Yu <yuzhih...@gmail.com> wrote: >>>> > Currently external/kafka/pom.xml doesn't cite yammer metrics as >>>> dependency. >>>> > >>>> > $ ls -l >>>> > >>>> ~/.m2/repository/com/yammer/metrics/metrics-core/2.2.0/metrics-core-2.2.0.jar >>>> > -rw-r--r-- 1 tyu staff 82123 Dec 17 2013 >>>> > >>>> /Users/tyu/.m2/repository/com/yammer/metrics/metrics-core/2.2.0/metrics-core-2.2.0.jar >>>> > >>>> > Including the metrics-core jar would not increase the size of the >>>> final >>>> > release artifact much. >>>> > >>>> > My two cents. >>>> >>> >>> >