Just remove "provided" from the end of the line where you specify the spark-streaming-kinesis-asl dependency. That will cause that package and all of its transitive dependencies (including the KCL, the AWS Java SDK libraries and other transitive dependencies) to be included in your "uber jar". They all must be in there because they are not part of the Spark distribution in your cluster.
However, as I mentioned before, I think making this change might cause you to run into the same problems I spoke of in the thread I linked below (https://www.mail-archive.com/user@spark.apache.org/msg23891.html), and unfortunately I haven't solved that yet. ~ Jonathan Kelly From: Vadim Bichutskiy <vadim.bichuts...@gmail.com<mailto:vadim.bichuts...@gmail.com>> Date: Friday, April 3, 2015 at 12:45 PM To: Jonathan Kelly <jonat...@amazon.com<mailto:jonat...@amazon.com>> Cc: "user@spark.apache.org<mailto:user@spark.apache.org>" <user@spark.apache.org<mailto:user@spark.apache.org>> Subject: Re: Spark + Kinesis Thanks. So how do I fix it? [https://mailfoogae.appspot.com/t?sender=admFkaW0uYmljaHV0c2tpeUBnbWFpbC5jb20%3D&type=zerocontent&guid=51a86f6a-7130-4760-aab3-f4368d8176b9]ᐧ On Fri, Apr 3, 2015 at 3:43 PM, Kelly, Jonathan <jonat...@amazon.com<mailto:jonat...@amazon.com>> wrote: spark-streaming-kinesis-asl is not part of the Spark distribution on your cluster, so you cannot have it be just a "provided" dependency. This is also why the KCL and its dependencies were not included in the assembly (but yes, they should be). ~ Jonathan Kelly From: Vadim Bichutskiy <vadim.bichuts...@gmail.com<mailto:vadim.bichuts...@gmail.com>> Date: Friday, April 3, 2015 at 12:26 PM To: Jonathan Kelly <jonat...@amazon.com<mailto:jonat...@amazon.com>> Cc: "user@spark.apache.org<mailto:user@spark.apache.org>" <user@spark.apache.org<mailto:user@spark.apache.org>> Subject: Re: Spark + Kinesis Hi all, Good news! I was able to create a Kinesis consumer and assemble it into an "uber jar" following http://spark.apache.org/docs/latest/streaming-kinesis-integration.html and example https://github.com/apache/spark/blob/master/extras/kinesis-asl/src/main/scala/org/apache/spark/examples/streaming/KinesisWordCountASL.scala. However when I try to spark-submit it I get the following exception: Exception in thread "main" java.lang.NoClassDefFoundError: com/amazonaws/auth/AWSCredentialsProvider Do I need to include KCL dependency in build.sbt, here's what it looks like currently: import AssemblyKeys._ name := "Kinesis Consumer" version := "1.0" organization := "com.myconsumer" scalaVersion := "2.11.5" libraryDependencies += "org.apache.spark" %% "spark-core" % "1.3.0" % "provided" libraryDependencies += "org.apache.spark" %% "spark-streaming" % "1.3.0" % "provided" libraryDependencies += "org.apache.spark" %% "spark-streaming-kinesis-asl" % "1.3.0" % "provided" assemblySettings jarName in assembly := "consumer-assembly.jar" assemblyOption in assembly := (assemblyOption in assembly).value.copy(includeScala=false) Any help appreciated. Thanks, Vadim On Thu, Apr 2, 2015 at 1:15 PM, Kelly, Jonathan <jonat...@amazon.com<mailto:jonat...@amazon.com>> wrote: It looks like you're attempting to mix Scala versions, so that's going to cause some problems. If you really want to use Scala 2.11.5, you must also use Spark package versions built for Scala 2.11 rather than 2.10. Anyway, that's not quite the correct way to specify Scala dependencies in build.sbt. Instead of placing the Scala version after the artifactId (like "spark-core_2.10"), what you actually want is to use just "spark-core" with two percent signs before it. Using two percent signs will make it use the version of Scala that matches your declared scalaVersion. For example: libraryDependencies += "org.apache.spark" %% "spark-core" % "1.3.0" % "provided" libraryDependencies += "org.apache.spark" %% "spark-streaming" % "1.3.0" % "provided" libraryDependencies += "org.apache.spark" %% "spark-streaming-kinesis-asl" % "1.3.0" I think that may get you a little closer, though I think you're probably going to run into the same problems I ran into in this thread: https://www.mail-archive.com/user@spark.apache.org/msg23891.html I never really got an answer for that, and I temporarily moved on to other things for now. ~ Jonathan Kelly From: 'Vadim Bichutskiy' <vadim.bichuts...@gmail.com<mailto:vadim.bichuts...@gmail.com>> Date: Thursday, April 2, 2015 at 9:53 AM To: "user@spark.apache.org<mailto:user@spark.apache.org>" <user@spark.apache.org<mailto:user@spark.apache.org>> Subject: Spark + Kinesis Hi all, I am trying to write an Amazon Kinesis consumer Scala app that processes data in the Kinesis stream. Is this the correct way to specify build.sbt: ------- import AssemblyKeys._ name := "Kinesis Consumer" version := "1.0" organization := "com.myconsumer" scalaVersion := "2.11.5" libraryDependencies ++= Seq("org.apache.spark" % "spark-core_2.10" % "1.3.0" % "provided", "org.apache.spark" % "spark-streaming_2.10" % "1.3.0" "org.apache.spark" % "spark-streaming-kinesis-asl_2.10" % "1.3.0") assemblySettings jarName in assembly := "consumer-assembly.jar" assemblyOption in assembly := (assemblyOption in assembly).value.copy(includeScala=false) -------- In project/assembly.sbt I have only the following line: addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.13.0") I am using sbt 0.13.7. I adapted Example 7.7 in the Learning Spark book. Thanks, Vadim