Richard, You response was very helpful and actually resolved my issue. In case others run into a similar issue, I followed the procedure:
- Upgraded to spark 1.3.0 - Add all spark related libraries are "provided" - Include spark transitive library dependencies where my build.sbt file libraryDependencies ++= { Seq( "org.apache.spark" %% "spark-core" % "1.3.0" % "provided", "org.apache.spark" %% "spark-streaming" % "1.3.0" % "provided", "org.apache.spark" %% "spark-streaming-kinesis-asl" % "1.3.0" % "provided", "joda-time" % "joda-time" % "2.2", "org.joda" % "joda-convert" % "1.2", "com.amazonaws" % "aws-java-sdk" % "1.8.3", "com.amazonaws" % "amazon-kinesis-client" % "1.2.0") and submitting a spark job can done via sh ./spark-1.3.0-bin-cdh4/bin/spark-submit --jars spark-streaming-kinesis-asl_2.10-1.3.0.jar --verbose --class com.xxx.MyClass target/scala-2.10/xxx-assembly-0.1-SNAPSHOT.jar Thanks again Richard! Cheers Mike. On Tue, Apr 14, 2015 at 11:01 AM, Richard Marscher <rmarsc...@localytics.com > wrote: > Hi, > > I've gotten an application working with sbt-assembly and spark, thought > I'd present an option. In my experience, trying to bundle any of the Spark > libraries in your uber jar is going to be a major pain. There will be a lot > of deduplication to work through and even if you resolve them it can be > easy to do it incorrectly. I considered it an intractable problem. So the > alternative is to not include those jars in your uber jar. For this to work > you will need the same libraries on the classpath of your Spark cluster and > your driver program (if you are running that as an application and not just > using spark-submit). > > As for your NoClassDefFoundError, you either are missing Joda Time in your > runtime classpath or have conflicting versions. It looks like something > related to AWS wants to use it. Check your uber jar to see if its including > the org/joda/time as well as the classpath of your spark cluster. For > example: I use the Spark 1.3.0 on Hadoop 1.x, which in the 'lib' directory > has an uber jar spark-assembly-1.3.0-hadoop1.0.4.jar. At one point in Spark > 1.2 I found a conflict between httpclient versions that my uber jar pulled > in for AWS libraries and the one bundled in the spark uber jar. I hand > patched the spark uber jar to remove the offending httpclient bytecode to > resolve the issue. You may be facing a similar situation. > > I hope that gives some ideas for resolving your issue. > > Regards, > Rich > > On Tue, Apr 14, 2015 at 1:14 PM, Mike Trienis <mike.trie...@orcsol.com> > wrote: > >> Hi Vadim, >> >> After removing "provided" from "org.apache.spark" %% >> "spark-streaming-kinesis-asl" I ended up with huge number of deduplicate >> errors: >> >> https://gist.github.com/trienism/3d6f8d6b7ff5b7cead6a >> >> It would be nice if you could share some pieces of your mergeStrategy >> code for reference. >> >> Also, after adding "provided" back to "spark-streaming-kinesis-asl" and I >> submit the spark job with the spark-streaming-kinesis-asl jar file >> >> sh /usr/lib/spark/bin/spark-submit --verbose --jars >> lib/spark-streaming-kinesis-asl_2.10-1.2.0.jar --class com.xxx.DataConsumer >> target/scala-2.10/xxx-assembly-0.1-SNAPSHOT.jar >> >> I still end up with the following error... >> >> Exception in thread "main" java.lang.NoClassDefFoundError: >> org/joda/time/format/DateTimeFormat >> at com.amazonaws.auth.AWS4Signer.<clinit>(AWS4Signer.java:44) >> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) >> at >> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) >> at >> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) >> at java.lang.reflect.Constructor.newInstance(Constructor.java:526) >> at java.lang.Class.newInstance(Class.java:379) >> >> Has anyone else run into this issue? >> >> >> >> On Mon, Apr 13, 2015 at 6:46 PM, Vadim Bichutskiy < >> vadim.bichuts...@gmail.com> wrote: >> >>> I don't believe the Kinesis asl should be provided. I used mergeStrategy >>> successfully to produce an "uber jar." >>> >>> Fyi, I've been having trouble consuming data out of Kinesis with Spark >>> with no success :( >>> Would be curious to know if you got it working. >>> >>> Vadim >>> >>> On Apr 13, 2015, at 9:36 PM, Mike Trienis <mike.trie...@orcsol.com> >>> wrote: >>> >>> Hi All, >>> >>> I have having trouble building a fat jar file through sbt-assembly. >>> >>> [warn] Merging 'META-INF/NOTICE.txt' with strategy 'rename' >>> [warn] Merging 'META-INF/NOTICE' with strategy 'rename' >>> [warn] Merging 'META-INF/LICENSE.txt' with strategy 'rename' >>> [warn] Merging 'META-INF/LICENSE' with strategy 'rename' >>> [warn] Merging 'META-INF/MANIFEST.MF' with strategy 'discard' >>> [warn] Merging >>> 'META-INF/maven/com.thoughtworks.paranamer/paranamer/pom.properties' with >>> strategy 'discard' >>> [warn] Merging >>> 'META-INF/maven/com.thoughtworks.paranamer/paranamer/pom.xml' with strategy >>> 'discard' >>> [warn] Merging 'META-INF/maven/commons-dbcp/commons-dbcp/pom.properties' >>> with strategy 'discard' >>> [warn] Merging 'META-INF/maven/commons-dbcp/commons-dbcp/pom.xml' with >>> strategy 'discard' >>> [warn] Merging 'META-INF/maven/commons-pool/commons-pool/pom.properties' >>> with strategy 'discard' >>> [warn] Merging 'META-INF/maven/commons-pool/commons-pool/pom.xml' with >>> strategy 'discard' >>> [warn] Merging 'META-INF/maven/joda-time/joda-time/pom.properties' with >>> strategy 'discard' >>> [warn] Merging 'META-INF/maven/joda-time/joda-time/pom.xml' with >>> strategy 'discard' >>> [warn] Merging 'META-INF/maven/log4j/log4j/pom.properties' with strategy >>> 'discard' >>> [warn] Merging 'META-INF/maven/log4j/log4j/pom.xml' with strategy >>> 'discard' >>> [warn] Merging 'META-INF/maven/org.joda/joda-convert/pom.properties' >>> with strategy 'discard' >>> [warn] Merging 'META-INF/maven/org.joda/joda-convert/pom.xml' with >>> strategy 'discard' >>> [warn] Merging 'META-INF/maven/org.slf4j/slf4j-api/pom.properties' with >>> strategy 'discard' >>> [warn] Merging 'META-INF/maven/org.slf4j/slf4j-api/pom.xml' with >>> strategy 'discard' >>> [warn] Merging 'META-INF/maven/org.slf4j/slf4j-log4j12/pom.properties' >>> with strategy 'discard' >>> [warn] Merging 'META-INF/maven/org.slf4j/slf4j-log4j12/pom.xml' with >>> strategy 'discard' >>> [warn] Merging 'META-INF/services/java.sql.Driver' with strategy >>> 'filterDistinctLines' >>> [warn] Merging 'rootdoc.txt' with strategy 'concat' >>> [warn] Strategy 'concat' was applied to a file >>> [warn] Strategy 'discard' was applied to 17 files >>> [warn] Strategy 'filterDistinctLines' was applied to a file >>> [warn] Strategy 'rename' was applied to 4 files >>> >>> When submitting the spark application through the command >>> >>> sh /usr/lib/spark/bin/spark-submit -class com.xxx.ExampleClassName >>> target/scala-2.10/xxxx-snapshot.jar >>> >>> I end up the the following error, >>> >>> Exception in thread "main" java.lang.NoClassDefFoundError: >>> org/joda/time/format/DateTimeFormat >>> at com.amazonaws.auth.AWS4Signer.<clinit>(AWS4Signer.java:44) >>> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) >>> at >>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) >>> at >>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) >>> at java.lang.reflect.Constructor.newInstance(Constructor.java:526) >>> at java.lang.Class.newInstance(Class.java:379) >>> at com.amazonaws.auth.SignerFactory.createSigner(SignerFactory.java:119) >>> at >>> com.amazonaws.auth.SignerFactory.lookupAndCreateSigner(SignerFactory.java:105) >>> at com.amazonaws.auth.SignerFactory.getSigner(SignerFactory.java:78) >>> at >>> com.amazonaws.AmazonWebServiceClient.computeSignerByServiceRegion(AmazonWebServiceClient.java:307) >>> at >>> com.amazonaws.AmazonWebServiceClient.computeSignerByURI(AmazonWebServiceClient.java:280) >>> at >>> com.amazonaws.AmazonWebServiceClient.setEndpoint(AmazonWebServiceClient.java:160) >>> at >>> com.amazonaws.services.kinesis.AmazonKinesisClient.setEndpoint(AmazonKinesisClient.java:2102) >>> at >>> com.amazonaws.services.kinesis.AmazonKinesisClient.init(AmazonKinesisClient.java:216) >>> at >>> com.amazonaws.services.kinesis.AmazonKinesisClient.<init>(AmazonKinesisClient.java:202) >>> at >>> com.amazonaws.services.kinesis.AmazonKinesisClient.<init>(AmazonKinesisClient.java:175) >>> at >>> com.amazonaws.services.kinesis.AmazonKinesisClient.<init>(AmazonKinesisClient.java:155) >>> at com.quickstatsengine.aws.AwsProvider$.<init>(AwsProvider.scala:20) >>> at com.quickstatsengine.aws.AwsProvider$.<clinit>(AwsProvider.scala) >>> >>> The snippet from my build.sbt file is: >>> >>> "org.apache.spark" %% "spark-core" % "1.2.0" % "provided", >>> "org.apache.spark" %% "spark-streaming" % "1.2.0" % "provided", >>> "com.datastax.spark" %% "spark-cassandra-connector" % >>> "1.2.0-alpha1" % "provided", >>> "org.apache.spark" %% "spark-streaming-kinesis-asl" % "1.2.0" % >>> "provided", >>> >>> And the error is originating from: >>> >>> val kinesisClient = new AmazonKinesisClient(new >>> DefaultAWSCredentialsProviderChain()) >>> >>> Am I correct to set spark-streaming-kinesis-asl as a *provided *dependency? >>> Also, is there a merge strategy I need apply? >>> >>> Any help would be appreciated, Mike. >>> >>> >>> >> >