The "magic incantation" is "sbt assembly" (not "assemble").
Actually I find maven with their assembly plugins to be very easy (mvn package). I can send a Pom.xml for a skeleton project if you need — Sent from Mailbox On Thu, Jun 5, 2014 at 6:59 AM, Jeremy Lee <unorthodox.engine...@gmail.com> wrote: > Hmm.. That's not working so well for me. First, I needed to add a > "project/plugin.sbt" file with the contents: > addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.11.4") > Before 'sbt/sbt assemble' worked at all. And I'm not sure about that > version number, but "0.9.1" isn't working much better and "11.4" is the > latest one recommended by the sbt project site. Where did you get your > version from? > Second, even when I do get it to build a .jar, spark-submit is still > telling me the external.twitter library is missing. > I tried using your github project as-is, but it also complained about the > missing plugin.. I'm trying it with various versions now to see if I can > get that working, even though I don't know anything about kafka. Hmm, and > no. Here's what I get: > [info] Set current project to Simple Project (in build > file:/home/ubuntu/spark-1.0.0/SparkKafka/) > [error] Not a valid command: assemble > [error] Not a valid project ID: assemble > [error] Expected ':' (if selecting a configuration) > [error] Not a valid key: assemble (similar: assembly, assemblyJarName, > assemblyDirectory) > [error] assemble > [error] > I also found this project which seemed to be exactly what I was after: > https://github.com/prabeesh/SparkTwitterAnalysis > ...but it was for Spark 0.9, and though I updated all the version > references to "1.0.0", that one doesn't work either. I can't even get it to > build. > *sigh* > Is it going to be easier to just copy the external/ source code into my own > project? Because I will... especially if creating "Uberjars" takes this > long every... single... time... > On Thu, Jun 5, 2014 at 8:52 AM, Jeremy Lee <unorthodox.engine...@gmail.com> > wrote: >> Thanks Patrick! >> >> Uberjars. Cool. I'd actually heard of them. And thanks for the link to the >> example! I shall work through that today. >> >> I'm still learning sbt and it's many options... the last new framework I >> learned was node.js, and I think I've been rather spoiled by "npm". >> >> At least it's not maven. Please, oh please don't make me learn maven too. >> (The only people who seem to like it have Software Stockholm Syndrome: "I >> know maven kidnapped me and beat me up, but if you spend long enough with >> it, you eventually start to sympathize and see it's point of view".) >> >> >> On Thu, Jun 5, 2014 at 3:39 AM, Patrick Wendell <pwend...@gmail.com> >> wrote: >> >>> Hey Jeremy, >>> >>> The issue is that you are using one of the external libraries and >>> these aren't actually packaged with Spark on the cluster, so you need >>> to create an uber jar that includes them. >>> >>> You can look at the example here (I recently did this for a kafka >>> project and the idea is the same): >>> >>> https://github.com/pwendell/kafka-spark-example >>> >>> You'll want to make an uber jar that includes these packages (run sbt >>> assembly) and then submit that jar to spark-submit. Also, I'd try >>> running it locally first (if you aren't already) just to make the >>> debugging simpler. >>> >>> - Patrick >>> >>> >>> On Wed, Jun 4, 2014 at 6:16 AM, Sean Owen <so...@cloudera.com> wrote: >>> > Ah sorry, this may be the thing I learned for the day. The issue is >>> > that classes from that particular artifact are missing though. Worth >>> > interrogating the resulting .jar file with "jar tf" to see if it made >>> > it in? >>> > >>> > On Wed, Jun 4, 2014 at 2:12 PM, Nick Pentreath < >>> nick.pentre...@gmail.com> wrote: >>> >> @Sean, the %% syntax in SBT should automatically add the Scala major >>> version >>> >> qualifier (_2.10, _2.11 etc) for you, so that does appear to be correct >>> >> syntax for the build. >>> >> >>> >> I seemed to run into this issue with some missing Jackson deps, and >>> solved >>> >> it by including the jar explicitly on the driver class path: >>> >> >>> >> bin/spark-submit --driver-class-path >>> >> SimpleApp/target/scala-2.10/simple-project_2.10-1.0.jar --class >>> "SimpleApp" >>> >> SimpleApp/target/scala-2.10/simple-project_2.10-1.0.jar >>> >> >>> >> Seems redundant to me since I thought that the JAR as argument is >>> copied to >>> >> driver and made available. But this solved it for me so perhaps give >>> it a >>> >> try? >>> >> >>> >> >>> >> >>> >> On Wed, Jun 4, 2014 at 3:01 PM, Sean Owen <so...@cloudera.com> wrote: >>> >>> >>> >>> Those aren't the names of the artifacts: >>> >>> >>> >>> >>> >>> >>> http://search.maven.org/#search%7Cga%7C1%7Ca%3A%22spark-streaming-twitter_2.10%22 >>> >>> >>> >>> The name is "spark-streaming-twitter_2.10" >>> >>> >>> >>> On Wed, Jun 4, 2014 at 1:49 PM, Jeremy Lee >>> >>> <unorthodox.engine...@gmail.com> wrote: >>> >>> > Man, this has been hard going. Six days, and I finally got a "Hello >>> >>> > World" >>> >>> > App working that I wrote myself. >>> >>> > >>> >>> > Now I'm trying to make a minimal streaming app based on the twitter >>> >>> > examples, (running standalone right now while learning) and when >>> running >>> >>> > it >>> >>> > like this: >>> >>> > >>> >>> > bin/spark-submit --class "SimpleApp" >>> >>> > SimpleApp/target/scala-2.10/simple-project_2.10-1.0.jar >>> >>> > >>> >>> > I'm getting this error: >>> >>> > >>> >>> > Exception in thread "main" java.lang.NoClassDefFoundError: >>> >>> > org/apache/spark/streaming/twitter/TwitterUtils$ >>> >>> > >>> >>> > Which I'm guessing is because I haven't put in a dependency to >>> >>> > "external/twitter" in the .sbt, but _how_? I can't find any docs on >>> it. >>> >>> > Here's my build file so far: >>> >>> > >>> >>> > simple.sbt >>> >>> > ------------------------------------------ >>> >>> > name := "Simple Project" >>> >>> > >>> >>> > version := "1.0" >>> >>> > >>> >>> > scalaVersion := "2.10.4" >>> >>> > >>> >>> > libraryDependencies += "org.apache.spark" %% "spark-core" % "1.0.0" >>> >>> > >>> >>> > libraryDependencies += "org.apache.spark" %% "spark-streaming" % >>> "1.0.0" >>> >>> > >>> >>> > libraryDependencies += "org.apache.spark" %% >>> "spark-streaming-twitter" % >>> >>> > "1.0.0" >>> >>> > >>> >>> > libraryDependencies += "org.twitter4j" % "twitter4j-stream" % >>> "3.0.3" >>> >>> > >>> >>> > resolvers += "Akka Repository" at "http://repo.akka.io/releases/" >>> >>> > ------------------------------------------ >>> >>> > >>> >>> > I've tried a few obvious things like adding: >>> >>> > >>> >>> > libraryDependencies += "org.apache.spark" %% "spark-external" % >>> "1.0.0" >>> >>> > >>> >>> > libraryDependencies += "org.apache.spark" %% >>> "spark-external-twitter" % >>> >>> > "1.0.0" >>> >>> > >>> >>> > because, well, that would match the naming scheme implied so far, >>> but it >>> >>> > errors. >>> >>> > >>> >>> > >>> >>> > Also, I just realized I don't completely understand if: >>> >>> > (a) the "spark-submit" command _sends_ the .jar to all the workers, >>> or >>> >>> > (b) the "spark-submit" commands sends a _job_ to the workers, which >>> are >>> >>> > supposed to already have the jar file installed (or in hdfs), or >>> >>> > (c) the Context is supposed to list the jars to be distributed. (is >>> that >>> >>> > deprecated?) >>> >>> > >>> >>> > One part of the documentation says: >>> >>> > >>> >>> > "Once you have an assembled jar you can call the bin/spark-submit >>> >>> > script as >>> >>> > shown here while passing your jar." >>> >>> > >>> >>> > but another says: >>> >>> > >>> >>> > "application-jar: Path to a bundled jar including your application >>> and >>> >>> > all >>> >>> > dependencies. The URL must be globally visible inside of your >>> cluster, >>> >>> > for >>> >>> > instance, an hdfs:// path or a file:// path that is present on all >>> >>> > nodes." >>> >>> > >>> >>> > I suppose both could be correct if you take a certain point of view. >>> >>> > >>> >>> > -- >>> >>> > Jeremy Lee BCompSci(Hons) >>> >>> > The Unorthodox Engineers >>> >> >>> >> >>> >> >> >> >> -- >> Jeremy Lee BCompSci(Hons) >> The Unorthodox Engineers >> > -- > Jeremy Lee BCompSci(Hons) > The Unorthodox Engineers