Thanks for answering Daniil - I have SBT version 0.13.5, is that an old version? Seems pretty up-to-date.
It turns out I figured out a way around this entire problem: just use 'sbt package', and when using bin/spark-submit, pass it the "--jars" option and GIVE IT ALL THE JARS from the local iv2 cache. Pretty inelegant, but at least I am able to develop, and when I want to make a super JAR with sbt assembly I can use the stupidly slow method. Here is the important snippet for grabbing all the JARs for the local cache of ivy2 : --jars $(find ~/.ivy2/cache/ -iname *.jar | tr '\n' ,) Here's the entire running command - bin/spark-submit --master local[*] --jars $(find /home/data/.ivy2/cache/ -iname *.jar | tr '\n' ,) --class KafkaStreamConsumer ~/code_host/data/scala/streamingKafka/target/scala-2.10/streamingkafka_2.10-1.0.jar node1:2181 my-consumer-group aris-topic 1 This is fairly bad, but it works around sbt assembly being incredibly slow On Tue, Sep 2, 2014 at 2:13 PM, Daniil Osipov <daniil.osi...@shazam.com> wrote: > What version of sbt are you using? There is a bug in early version of 0.13 > that causes assembly to be extremely slow - make sure you're using the > latest one. > > > On Fri, Aug 29, 2014 at 1:30 PM, Aris <> wrote: > >> Hi folks, >> >> I am trying to use Kafka with Spark Streaming, and it appears I cannot do >> the normal 'sbt package' as I do with other Spark applications, such as >> Spark alone or Spark with MLlib. I learned I have to build with the >> sbt-assembly plugin. >> >> OK, so here is my build.sbt file for my extremely simple test Kafka/Spark >> Streaming project. It Takes almost 30 minutes to build! This is a Centos >> Linux machine on SSDs with 4GB of RAM, it's never been slow for me. To >> compare, sbt assembly for the entire Spark project itself takes less than >> 10 minutes. >> >> At the bottom of this file I am trying to play with 'cacheOutput' >> options, because I read online that maybe I am calculating SHA-1 for all >> the *.class files in this super JAR. >> >> I also copied the mergeStrategy from Spark contributor TD Spark Streaming >> tutorial from Spark Summit 2014. >> >> Again, is there some better way to build this JAR file, just using sbt >> package? This is process is working, but very slow. >> >> Any help with speeding up this compilation is really appreciated!! >> >> Aris >> >> ----------------------------------------- >> >> import AssemblyKeys._ // put this at the top of the file >> >> name := "streamingKafka" >> >> version := "1.0" >> >> scalaVersion := "2.10.4" >> >> libraryDependencies ++= Seq( >> "org.apache.spark" %% "spark-core" % "1.0.1" % "provided", >> "org.apache.spark" %% "spark-streaming" % "1.0.1" % "provided", >> "org.apache.spark" %% "spark-streaming-kafka" % "1.0.1" >> ) >> >> assemblySettings >> >> jarName in assembly := "streamingkafka-assembly.jar" >> >> mergeStrategy in assembly := { >> case m if m.toLowerCase.endsWith("manifest.mf") => >> MergeStrategy.discard >> case m if m.toLowerCase.matches("meta-inf.*\\.sf$") => >> MergeStrategy.discard >> case "log4j.properties" => >> MergeStrategy.discard >> case m if m.toLowerCase.startsWith("meta-inf/services/") => >> MergeStrategy.filterDistinctLines >> case "reference.conf" => >> MergeStrategy.concat >> case _ => >> MergeStrategy.first >> } >> >> assemblyOption in assembly ~= { _.copy(cacheOutput = false) } >> >> >