It would be good if you can tell what I should add to the documentation to
make it easier to understand. I can update the docs for 1.4.0 release.
On Tue, May 12, 2015 at 9:52 AM, Lee McFadden wrote:
> Thanks for explaining Sean and Cody, this makes sense now. I'd like to
> help improve this doc
Thanks for explaining Sean and Cody, this makes sense now. I'd like to
help improve this documentation so other python users don't run into the
same thing, so I'll look into that today.
On Tue, May 12, 2015 at 9:44 AM Cody Koeninger wrote:
> One of the packages just contains the streaming-kafka
One of the packages just contains the streaming-kafka code. The other
contains that code, plus everything it depends on. That's what "assembly"
typically means in JVM land.
Java/Scala users are accustomed to using their own build tool to include
necessary dependencies. JVM dependency management
I think Java-land users will understand to look for an assembly jar in
general, but it's not as obvious outside the Java ecosystem. Assembly
= this thing, plus all its transitive dependencies.
No, there is nothing wrong with Kafka at all. You need to bring
everything it needs for it to work at run
Thanks again for all the help folks.
I can confirm that simply switching to `--packages
org.apache.spark:spark-streaming-kafka-assembly_2.10:1.3.1` makes
everything work as intended.
I'm not sure what the difference is between the two packages honestly, or
why one should be used over the other, b
bq. it is already in the assembly
Yes. Verified:
$ jar tvf ~/Downloads/spark-streaming-kafka-assembly_2.10-1.3.1.jar |
grep yammer | grep Gauge
1329 Sat Apr 11 04:25:50 PDT 2015 com/yammer/metrics/core/Gauge.class
On Tue, May 12, 2015 at 8:05 AM, Sean Owen wrote:
> It doesn't depend directl
It doesn't depend directly on yammer metrics; Kafka does. It wouldn't
be correct to declare that it does; it is already in the assembly
anyway.
On Tue, May 12, 2015 at 3:50 PM, Ted Yu wrote:
> Currently external/kafka/pom.xml doesn't cite yammer metrics as dependency.
>
> $ ls -l
> ~/.m2/reposito
Currently external/kafka/pom.xml doesn't cite yammer metrics as dependency.
$ ls -l
~/.m2/repository/com/yammer/metrics/metrics-core/2.2.0/metrics-core-2.2.0.jar
-rw-r--r-- 1 tyu staff 82123 Dec 17 2013
/Users/tyu/.m2/repository/com/yammer/metrics/metrics-core/2.2.0/metrics-core-2.2.0.jar
Inc
Yeah, fair point about Python.
spark-streaming-kafka should not contain third-party dependencies.
However there's nothing stopping the build from producing an assembly
jar from these modules. I think there is an assembly target already
though?
On Tue, May 12, 2015 at 3:37 PM, Lee McFadden wrote:
I'm not a python user, but isn't that part of what the
spark-streaming-kafka-assembly subproject is for?
I.e. use
spark-streaming-kafka-assembly_2.10:1.3.1
instead of
spark-streaming-kafka_2.10:1.3.1
On Tue, May 12, 2015 at 9:37 AM, Lee McFadden wrote:
> Sorry to flog this dead horse, but this
Sorry to flog this dead horse, but this is something every python user is
going to run into as we *cannot* build the dependencies onto our app. There
is no way to do that with a python script.
As I see it, this is not a third party integration. The package missing its
dependencies is built by the
The question is really whether all the third-party integrations should
be built into Spark's main assembly. I think reasonable people could
disagree, but I think the current state (not built in) is reasonable.
It means you have to bring the integration with you.
That is, no, third-party queue inte
I opened a ticket on this (without posting here first - bad etiquette,
apologies) which was closed as 'fixed'.
https://issues.apache.org/jira/browse/SPARK-7538
I don't believe that because I have my script running means this is fixed,
I think it is still an issue.
I downloaded the spark source,
Ted, many thanks. I'm not used to Java dependencies so this was a real
head-scratcher for me.
Downloading the two metrics packages from the maven repository
(metrics-core, metrics-annotation) and supplying it on the spark-submit
command line worked.
My final spark-submit for a python project usi
Ah yes, the Kafka + streaming code isn't in the assembly, is it? you'd
have to provide it and all its dependencies with your app. You could
also build this into your own app jar. Tools like Maven will add in
the transitive dependencies.
On Mon, May 11, 2015 at 10:04 PM, Lee McFadden wrote:
> Than
You can use '--jars ' option of spark-submit to ship metrics-core jar.
Cheers
On Mon, May 11, 2015 at 2:04 PM, Lee McFadden wrote:
> Thanks Ted,
>
> The issue is that I'm using packages (see spark-submit definition) and I
> do not know how to add com.yammer.metrics:metrics-core to my classpath
Thanks Ted,
The issue is that I'm using packages (see spark-submit definition) and I do
not know how to add com.yammer.metrics:metrics-core to my classpath so
Spark can see it.
Should metrics-core not be part of
the org.apache.spark:spark-streaming-kafka_2.10:1.3.1 package so it can
work correctl
com.yammer.metrics.core.Gauge is in metrics-core jar
e.g., in master branch:
[INFO] | \- org.apache.kafka:kafka_2.10:jar:0.8.1.1:compile
[INFO] | +- com.yammer.metrics:metrics-core:jar:2.2.0:compile
Please make sure metrics-core jar is on the classpath.
On Mon, May 11, 2015 at 1:32 PM, Lee M
Hi,
We've been having some issues getting spark streaming running correctly
using a Kafka stream, and we've been going around in circles trying to
resolve this dependency.
Details of our environment and the error below, if anyone can help resolve
this it would be much appreciated.
Submit command
19 matches
Mail list logo