Have you tried marking only spark-streaming-kinesis-asl as not provided, and the rest as provided? Then you will not even need to add kinesis-asl.jar in the spark-submit.
TD On Tue, Apr 14, 2015 at 2:27 PM, Mike Trienis <mike.trie...@orcsol.com> wrote: > Richard, > > You response was very helpful and actually resolved my issue. In case > others run into a similar issue, I followed the procedure: > > - Upgraded to spark 1.3.0 > - Add all spark related libraries are "provided" > - Include spark transitive library dependencies > > where my build.sbt file > > libraryDependencies ++= { > Seq( > "org.apache.spark" %% "spark-core" % "1.3.0" % "provided", > "org.apache.spark" %% "spark-streaming" % "1.3.0" % "provided", > "org.apache.spark" %% "spark-streaming-kinesis-asl" % "1.3.0" % > "provided", > "joda-time" % "joda-time" % "2.2", > "org.joda" % "joda-convert" % "1.2", > "com.amazonaws" % "aws-java-sdk" % "1.8.3", > "com.amazonaws" % "amazon-kinesis-client" % "1.2.0") > > and submitting a spark job can done via > > sh ./spark-1.3.0-bin-cdh4/bin/spark-submit --jars > spark-streaming-kinesis-asl_2.10-1.3.0.jar --verbose --class > com.xxx.MyClass target/scala-2.10/xxx-assembly-0.1-SNAPSHOT.jar > > Thanks again Richard! > > Cheers Mike. > > > On Tue, Apr 14, 2015 at 11:01 AM, Richard Marscher < > rmarsc...@localytics.com> wrote: > >> Hi, >> >> I've gotten an application working with sbt-assembly and spark, thought >> I'd present an option. In my experience, trying to bundle any of the Spark >> libraries in your uber jar is going to be a major pain. There will be a lot >> of deduplication to work through and even if you resolve them it can be >> easy to do it incorrectly. I considered it an intractable problem. So the >> alternative is to not include those jars in your uber jar. For this to work >> you will need the same libraries on the classpath of your Spark cluster and >> your driver program (if you are running that as an application and not just >> using spark-submit). >> >> As for your NoClassDefFoundError, you either are missing Joda Time in >> your runtime classpath or have conflicting versions. It looks like >> something related to AWS wants to use it. Check your uber jar to see if its >> including the org/joda/time as well as the classpath of your spark cluster. >> For example: I use the Spark 1.3.0 on Hadoop 1.x, which in the 'lib' >> directory has an uber jar spark-assembly-1.3.0-hadoop1.0.4.jar. At one >> point in Spark 1.2 I found a conflict between httpclient versions that my >> uber jar pulled in for AWS libraries and the one bundled in the spark uber >> jar. I hand patched the spark uber jar to remove the offending httpclient >> bytecode to resolve the issue. You may be facing a similar situation. >> >> I hope that gives some ideas for resolving your issue. >> >> Regards, >> Rich >> >> On Tue, Apr 14, 2015 at 1:14 PM, Mike Trienis <mike.trie...@orcsol.com> >> wrote: >> >>> Hi Vadim, >>> >>> After removing "provided" from "org.apache.spark" %% >>> "spark-streaming-kinesis-asl" I ended up with huge number of deduplicate >>> errors: >>> >>> https://gist.github.com/trienism/3d6f8d6b7ff5b7cead6a >>> >>> It would be nice if you could share some pieces of your mergeStrategy >>> code for reference. >>> >>> Also, after adding "provided" back to "spark-streaming-kinesis-asl" and >>> I submit the spark job with the spark-streaming-kinesis-asl jar file >>> >>> sh /usr/lib/spark/bin/spark-submit --verbose --jars >>> lib/spark-streaming-kinesis-asl_2.10-1.2.0.jar --class com.xxx.DataConsumer >>> target/scala-2.10/xxx-assembly-0.1-SNAPSHOT.jar >>> >>> I still end up with the following error... >>> >>> Exception in thread "main" java.lang.NoClassDefFoundError: >>> org/joda/time/format/DateTimeFormat >>> at com.amazonaws.auth.AWS4Signer.<clinit>(AWS4Signer.java:44) >>> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) >>> at >>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) >>> at >>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) >>> at java.lang.reflect.Constructor.newInstance(Constructor.java:526) >>> at java.lang.Class.newInstance(Class.java:379) >>> >>> Has anyone else run into this issue? >>> >>> >>> >>> On Mon, Apr 13, 2015 at 6:46 PM, Vadim Bichutskiy < >>> vadim.bichuts...@gmail.com> wrote: >>> >>>> I don't believe the Kinesis asl should be provided. I used >>>> mergeStrategy successfully to produce an "uber jar." >>>> >>>> Fyi, I've been having trouble consuming data out of Kinesis with Spark >>>> with no success :( >>>> Would be curious to know if you got it working. >>>> >>>> Vadim >>>> >>>> On Apr 13, 2015, at 9:36 PM, Mike Trienis <mike.trie...@orcsol.com> >>>> wrote: >>>> >>>> Hi All, >>>> >>>> I have having trouble building a fat jar file through sbt-assembly. >>>> >>>> [warn] Merging 'META-INF/NOTICE.txt' with strategy 'rename' >>>> [warn] Merging 'META-INF/NOTICE' with strategy 'rename' >>>> [warn] Merging 'META-INF/LICENSE.txt' with strategy 'rename' >>>> [warn] Merging 'META-INF/LICENSE' with strategy 'rename' >>>> [warn] Merging 'META-INF/MANIFEST.MF' with strategy 'discard' >>>> [warn] Merging >>>> 'META-INF/maven/com.thoughtworks.paranamer/paranamer/pom.properties' with >>>> strategy 'discard' >>>> [warn] Merging >>>> 'META-INF/maven/com.thoughtworks.paranamer/paranamer/pom.xml' with strategy >>>> 'discard' >>>> [warn] Merging >>>> 'META-INF/maven/commons-dbcp/commons-dbcp/pom.properties' with strategy >>>> 'discard' >>>> [warn] Merging 'META-INF/maven/commons-dbcp/commons-dbcp/pom.xml' with >>>> strategy 'discard' >>>> [warn] Merging >>>> 'META-INF/maven/commons-pool/commons-pool/pom.properties' with strategy >>>> 'discard' >>>> [warn] Merging 'META-INF/maven/commons-pool/commons-pool/pom.xml' with >>>> strategy 'discard' >>>> [warn] Merging 'META-INF/maven/joda-time/joda-time/pom.properties' with >>>> strategy 'discard' >>>> [warn] Merging 'META-INF/maven/joda-time/joda-time/pom.xml' with >>>> strategy 'discard' >>>> [warn] Merging 'META-INF/maven/log4j/log4j/pom.properties' with >>>> strategy 'discard' >>>> [warn] Merging 'META-INF/maven/log4j/log4j/pom.xml' with strategy >>>> 'discard' >>>> [warn] Merging 'META-INF/maven/org.joda/joda-convert/pom.properties' >>>> with strategy 'discard' >>>> [warn] Merging 'META-INF/maven/org.joda/joda-convert/pom.xml' with >>>> strategy 'discard' >>>> [warn] Merging 'META-INF/maven/org.slf4j/slf4j-api/pom.properties' with >>>> strategy 'discard' >>>> [warn] Merging 'META-INF/maven/org.slf4j/slf4j-api/pom.xml' with >>>> strategy 'discard' >>>> [warn] Merging 'META-INF/maven/org.slf4j/slf4j-log4j12/pom.properties' >>>> with strategy 'discard' >>>> [warn] Merging 'META-INF/maven/org.slf4j/slf4j-log4j12/pom.xml' with >>>> strategy 'discard' >>>> [warn] Merging 'META-INF/services/java.sql.Driver' with strategy >>>> 'filterDistinctLines' >>>> [warn] Merging 'rootdoc.txt' with strategy 'concat' >>>> [warn] Strategy 'concat' was applied to a file >>>> [warn] Strategy 'discard' was applied to 17 files >>>> [warn] Strategy 'filterDistinctLines' was applied to a file >>>> [warn] Strategy 'rename' was applied to 4 files >>>> >>>> When submitting the spark application through the command >>>> >>>> sh /usr/lib/spark/bin/spark-submit -class com.xxx.ExampleClassName >>>> target/scala-2.10/xxxx-snapshot.jar >>>> >>>> I end up the the following error, >>>> >>>> Exception in thread "main" java.lang.NoClassDefFoundError: >>>> org/joda/time/format/DateTimeFormat >>>> at com.amazonaws.auth.AWS4Signer.<clinit>(AWS4Signer.java:44) >>>> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) >>>> at >>>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) >>>> at >>>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) >>>> at java.lang.reflect.Constructor.newInstance(Constructor.java:526) >>>> at java.lang.Class.newInstance(Class.java:379) >>>> at com.amazonaws.auth.SignerFactory.createSigner(SignerFactory.java:119) >>>> at >>>> com.amazonaws.auth.SignerFactory.lookupAndCreateSigner(SignerFactory.java:105) >>>> at com.amazonaws.auth.SignerFactory.getSigner(SignerFactory.java:78) >>>> at >>>> com.amazonaws.AmazonWebServiceClient.computeSignerByServiceRegion(AmazonWebServiceClient.java:307) >>>> at >>>> com.amazonaws.AmazonWebServiceClient.computeSignerByURI(AmazonWebServiceClient.java:280) >>>> at >>>> com.amazonaws.AmazonWebServiceClient.setEndpoint(AmazonWebServiceClient.java:160) >>>> at >>>> com.amazonaws.services.kinesis.AmazonKinesisClient.setEndpoint(AmazonKinesisClient.java:2102) >>>> at >>>> com.amazonaws.services.kinesis.AmazonKinesisClient.init(AmazonKinesisClient.java:216) >>>> at >>>> com.amazonaws.services.kinesis.AmazonKinesisClient.<init>(AmazonKinesisClient.java:202) >>>> at >>>> com.amazonaws.services.kinesis.AmazonKinesisClient.<init>(AmazonKinesisClient.java:175) >>>> at >>>> com.amazonaws.services.kinesis.AmazonKinesisClient.<init>(AmazonKinesisClient.java:155) >>>> at com.quickstatsengine.aws.AwsProvider$.<init>(AwsProvider.scala:20) >>>> at com.quickstatsengine.aws.AwsProvider$.<clinit>(AwsProvider.scala) >>>> >>>> The snippet from my build.sbt file is: >>>> >>>> "org.apache.spark" %% "spark-core" % "1.2.0" % "provided", >>>> "org.apache.spark" %% "spark-streaming" % "1.2.0" % "provided", >>>> "com.datastax.spark" %% "spark-cassandra-connector" % >>>> "1.2.0-alpha1" % "provided", >>>> "org.apache.spark" %% "spark-streaming-kinesis-asl" % "1.2.0" % >>>> "provided", >>>> >>>> And the error is originating from: >>>> >>>> val kinesisClient = new AmazonKinesisClient(new >>>> DefaultAWSCredentialsProviderChain()) >>>> >>>> Am I correct to set spark-streaming-kinesis-asl as a *provided *dependency? >>>> Also, is there a merge strategy I need apply? >>>> >>>> Any help would be appreciated, Mike. >>>> >>>> >>>> >>> >> >