Yes, I do have the following dependencies marked as "provided":

libraryDependencies += "org.apache.spark" %% "spark-core" % "1.3.0" % "provided"
libraryDependencies += "org.apache.spark" %% "spark-hive" % "1.3.0" % "provided"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "1.3.0" % "provided"
libraryDependencies += "org.apache.spark" %% "spark-streaming" % "1.3.0" % 
"provided"

However, spark-streaming-kinesis-asl has a compile time dependency on 
spark-streaming, so I think that causes it and its dependencies to be pulled 
into the assembly.  I expected that simply excluding spark-streaming in the 
spark-streaming-kinesis-asl dependency would solve this problem, but it does 
not.  That is, this doesn't work either:

libraryDependencies += "org.apache.spark" %% "spark-streaming-kinesis-asl" % 
"1.3.0" exclude("org.apache.spark", "spark-streaming")

As I mentioned originally, the following solved some but not all conflicts:

libraryDependencies += "org.apache.spark" %% "spark-streaming-kinesis-asl" % 
"1.3.0" excludeAll(
  ExclusionRule(organization = "org.apache.hadoop"),
  ExclusionRule(organization = "org.apache.spark", name = "spark-streaming")
)

(Note that ExclusionRule(organization = "org.apache.spark") without the "name" 
attribute does not work because that apparently causes it to exclude even 
spark-streaming-kinesis-asl.)

Jonathan Kelly
Elastic MapReduce - SDE
Port 99 (SEA35) 08.220.C2

From: Tathagata Das <t...@databricks.com<mailto:t...@databricks.com>>
Date: Monday, March 16, 2015 at 12:45 PM
To: Jonathan Kelly <jonat...@amazon.com<mailto:jonat...@amazon.com>>
Cc: "user@spark.apache.org<mailto:user@spark.apache.org>" 
<user@spark.apache.org<mailto:user@spark.apache.org>>
Subject: Re: problems with spark-streaming-kinesis-asl and "sbt assembly" 
("different file contents found")

If you are creating an assembly, make sure spark-streaming is marked as 
provided. spark-streaming is already part of the spark installation so will be 
present at run time. That might solve some of these, may be!?

TD

On Mon, Mar 16, 2015 at 11:30 AM, Kelly, Jonathan 
<jonat...@amazon.com<mailto:jonat...@amazon.com>> wrote:
I'm attempting to use the Spark Kinesis Connector, so I've added the following 
dependency in my build.sbt:

libraryDependencies += "org.apache.spark" %% "spark-streaming-kinesis-asl" % 
"1.3.0"

My app works fine with "sbt run", but I can't seem to get "sbt assembly" to 
work without failing with "different file contents found" errors due to 
different versions of various packages getting pulled in to the assembly.  This 
only occurs when I've added spark-streaming-kinesis-asl as a dependency. "sbt 
assembly" works fine otherwise.

Here are the conflicts that I see:

com.esotericsoftware.kryo:kryo:2.21
com.esotericsoftware.minlog:minlog:1.2

com.google.guava:guava:15.0
org.apache.spark:spark-network-common_2.10:1.3.0

(Note: The conflict is with javac.sh; why is this even getting included?)
org.apache.spark:spark-streaming-kinesis-asl_2.10:1.3.0
org.apache.spark:spark-streaming_2.10:1.3.0
org.apache.spark:spark-core_2.10:1.3.0
org.apache.spark:spark-network-common_2.10:1.3.0
org.apache.spark:spark-network-shuffle_2.10:1.3.0

(Note: I'm actually using my own custom-built version of Spark-1.3.0 where I've 
upgraded to v1.9.24 of the AWS Java SDK, but that has nothing to do with all of 
these conflicts, as I upgraded the dependency *because* I was getting all of 
these conflicts with the Spark 1.3.0 artifacts from the central repo.)
com.amazonaws:aws-java-sdk-s3:1.9.24
net.java.dev.jets3t:jets3t:0.9.3

commons-collections:commons-collections:3.2.1
commons-beanutils-commons-beanutils:1.7.0
commons-beanutils:commons-beanutils-core:1.8.0

commons-logging:commons-logging:1.1.3
org.slf4j:jcl-over-slf4j:1.7.10

(Note: The conflict is with a few package-info.class files, which seems really 
silly.)
org.apache.hadoop:hadoop-yarn-common:2.4.0
org.apache.hadoop:hadoop-yarn-api:2.4.0

(Note: The conflict is with org/apache/spark/unused/UnusedStubClass.class, 
which seems even more silly.)
org.apache.spark:spark-streaming-kinesis-asl_2.10:1.3.0
org.apache.spark:spark-streaming_2.10:1.3.0
org.apache.spark:spark-core_2.10:1.3.0
org.apache.spark:spark-network-common_2.10:1.3.0
org.spark-project.spark:unused:1.0.0 (?!?!?!)
org.apache.spark:spark-network-shuffle_2.10:1.3.0

I can get rid of some of the conflicts by using excludeAll() to exclude 
artifacts with organization = "org.apache.hadoop" or organization = 
"org.apache.spark" and name = "spark-streaming", and I might be able to resolve 
a few other conflicts this way, but the bottom line is that this is way more 
complicated than it should be, so either something is really broken or I'm just 
doing something wrong.

Many of these don't even make sense to me.  For example, the very first 
conflict is between classes in com.esotericsoftware.kryo:kryo:2.21 and in 
com.esotericsoftware.minlog:minlog:1.2, but the former *depends* upon the 
latter, so ???  It seems wrong to me that one package would contain different 
versions of the same classes that are included in one of its dependencies.  I 
guess it doesn't make too much difference though if I could only get my 
assembly to include/exclude the right packages.  I of course don't want any of 
the spark or hadoop dependencies included (other than 
spark-streaming-kinesis-asl itself), but I want all of 
spark-streaming-kinesis-asl's dependencies included (such as the AWS Java SDK 
and its dependencies).  That doesn't seem to be possible without what I imagine 
will become an unruly and fragile exclusion list though.

Thanks,
Jonathan

Reply via email to