The "magic incantation" is "sbt assembly" (not "assemble").


Actually I find maven with their assembly plugins to be very easy (mvn 
package). I can send a Pom.xml for a skeleton project if you need
—
Sent from Mailbox

On Thu, Jun 5, 2014 at 6:59 AM, Jeremy Lee <unorthodox.engine...@gmail.com>
wrote:

> Hmm.. That's not working so well for me. First, I needed to add a
> "project/plugin.sbt" file with the contents:
> addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.11.4")
> Before 'sbt/sbt assemble' worked at all. And I'm not sure about that
> version number, but "0.9.1" isn't working much better and "11.4" is the
> latest one recommended by the sbt project site. Where did you get your
> version from?
> Second, even when I do get it to build a .jar, spark-submit is still
> telling me the external.twitter library is missing.
> I tried using your github project as-is, but it also complained about the
> missing plugin.. I'm trying it with various versions now to see if I can
> get that working, even though I don't know anything about kafka. Hmm, and
> no. Here's what I get:
> [info] Set current project to Simple Project (in build
> file:/home/ubuntu/spark-1.0.0/SparkKafka/)
> [error] Not a valid command: assemble
> [error] Not a valid project ID: assemble
> [error] Expected ':' (if selecting a configuration)
> [error] Not a valid key: assemble (similar: assembly, assemblyJarName,
> assemblyDirectory)
> [error] assemble
> [error]
> I also found this project which seemed to be exactly what I was after:
> https://github.com/prabeesh/SparkTwitterAnalysis
> ...but it was for Spark 0.9, and though I updated all the version
> references to "1.0.0", that one doesn't work either. I can't even get it to
> build.
> *sigh*
> Is it going to be easier to just copy the external/ source code into my own
> project? Because I will... especially if creating "Uberjars" takes this
> long every... single... time...
> On Thu, Jun 5, 2014 at 8:52 AM, Jeremy Lee <unorthodox.engine...@gmail.com>
> wrote:
>> Thanks Patrick!
>>
>> Uberjars. Cool. I'd actually heard of them. And thanks for the link to the
>> example! I shall work through that today.
>>
>> I'm still learning sbt and it's many options... the last new framework I
>> learned was node.js, and I think I've been rather spoiled by "npm".
>>
>> At least it's not maven. Please, oh please don't make me learn maven too.
>> (The only people who seem to like it have Software Stockholm Syndrome: "I
>> know maven kidnapped me and beat me up, but if you spend long enough with
>> it, you eventually start to sympathize and see it's point of view".)
>>
>>
>> On Thu, Jun 5, 2014 at 3:39 AM, Patrick Wendell <pwend...@gmail.com>
>> wrote:
>>
>>> Hey Jeremy,
>>>
>>> The issue is that you are using one of the external libraries and
>>> these aren't actually packaged with Spark on the cluster, so you need
>>> to create an uber jar that includes them.
>>>
>>> You can look at the example here (I recently did this for a kafka
>>> project and the idea is the same):
>>>
>>> https://github.com/pwendell/kafka-spark-example
>>>
>>> You'll want to make an uber jar that includes these packages (run sbt
>>> assembly) and then submit that jar to spark-submit. Also, I'd try
>>> running it locally first (if you aren't already) just to make the
>>> debugging simpler.
>>>
>>> - Patrick
>>>
>>>
>>> On Wed, Jun 4, 2014 at 6:16 AM, Sean Owen <so...@cloudera.com> wrote:
>>> > Ah sorry, this may be the thing I learned for the day. The issue is
>>> > that classes from that particular artifact are missing though. Worth
>>> > interrogating the resulting .jar file with "jar tf" to see if it made
>>> > it in?
>>> >
>>> > On Wed, Jun 4, 2014 at 2:12 PM, Nick Pentreath <
>>> nick.pentre...@gmail.com> wrote:
>>> >> @Sean, the %% syntax in SBT should automatically add the Scala major
>>> version
>>> >> qualifier (_2.10, _2.11 etc) for you, so that does appear to be correct
>>> >> syntax for the build.
>>> >>
>>> >> I seemed to run into this issue with some missing Jackson deps, and
>>> solved
>>> >> it by including the jar explicitly on the driver class path:
>>> >>
>>> >> bin/spark-submit --driver-class-path
>>> >> SimpleApp/target/scala-2.10/simple-project_2.10-1.0.jar --class
>>> "SimpleApp"
>>> >> SimpleApp/target/scala-2.10/simple-project_2.10-1.0.jar
>>> >>
>>> >> Seems redundant to me since I thought that the JAR as argument is
>>> copied to
>>> >> driver and made available. But this solved it for me so perhaps give
>>> it a
>>> >> try?
>>> >>
>>> >>
>>> >>
>>> >> On Wed, Jun 4, 2014 at 3:01 PM, Sean Owen <so...@cloudera.com> wrote:
>>> >>>
>>> >>> Those aren't the names of the artifacts:
>>> >>>
>>> >>>
>>> >>>
>>> http://search.maven.org/#search%7Cga%7C1%7Ca%3A%22spark-streaming-twitter_2.10%22
>>> >>>
>>> >>> The name is "spark-streaming-twitter_2.10"
>>> >>>
>>> >>> On Wed, Jun 4, 2014 at 1:49 PM, Jeremy Lee
>>> >>> <unorthodox.engine...@gmail.com> wrote:
>>> >>> > Man, this has been hard going. Six days, and I finally got a "Hello
>>> >>> > World"
>>> >>> > App working that I wrote myself.
>>> >>> >
>>> >>> > Now I'm trying to make a minimal streaming app based on the twitter
>>> >>> > examples, (running standalone right now while learning) and when
>>> running
>>> >>> > it
>>> >>> > like this:
>>> >>> >
>>> >>> > bin/spark-submit --class "SimpleApp"
>>> >>> > SimpleApp/target/scala-2.10/simple-project_2.10-1.0.jar
>>> >>> >
>>> >>> > I'm getting this error:
>>> >>> >
>>> >>> > Exception in thread "main" java.lang.NoClassDefFoundError:
>>> >>> > org/apache/spark/streaming/twitter/TwitterUtils$
>>> >>> >
>>> >>> > Which I'm guessing is because I haven't put in a dependency to
>>> >>> > "external/twitter" in the .sbt, but _how_? I can't find any docs on
>>> it.
>>> >>> > Here's my build file so far:
>>> >>> >
>>> >>> > simple.sbt
>>> >>> > ------------------------------------------
>>> >>> > name := "Simple Project"
>>> >>> >
>>> >>> > version := "1.0"
>>> >>> >
>>> >>> > scalaVersion := "2.10.4"
>>> >>> >
>>> >>> > libraryDependencies += "org.apache.spark" %% "spark-core" % "1.0.0"
>>> >>> >
>>> >>> > libraryDependencies += "org.apache.spark" %% "spark-streaming" %
>>> "1.0.0"
>>> >>> >
>>> >>> > libraryDependencies += "org.apache.spark" %%
>>> "spark-streaming-twitter" %
>>> >>> > "1.0.0"
>>> >>> >
>>> >>> > libraryDependencies += "org.twitter4j" % "twitter4j-stream" %
>>> "3.0.3"
>>> >>> >
>>> >>> > resolvers += "Akka Repository" at "http://repo.akka.io/releases/";
>>> >>> > ------------------------------------------
>>> >>> >
>>> >>> > I've tried a few obvious things like adding:
>>> >>> >
>>> >>> > libraryDependencies += "org.apache.spark" %% "spark-external" %
>>> "1.0.0"
>>> >>> >
>>> >>> > libraryDependencies += "org.apache.spark" %%
>>> "spark-external-twitter" %
>>> >>> > "1.0.0"
>>> >>> >
>>> >>> > because, well, that would match the naming scheme implied so far,
>>> but it
>>> >>> > errors.
>>> >>> >
>>> >>> >
>>> >>> > Also, I just realized I don't completely understand if:
>>> >>> > (a) the "spark-submit" command _sends_ the .jar to all the workers,
>>> or
>>> >>> > (b) the "spark-submit" commands sends a _job_ to the workers, which
>>> are
>>> >>> > supposed to already have the jar file installed (or in hdfs), or
>>> >>> > (c) the Context is supposed to list the jars to be distributed. (is
>>> that
>>> >>> > deprecated?)
>>> >>> >
>>> >>> > One part of the documentation says:
>>> >>> >
>>> >>> >  "Once you have an assembled jar you can call the bin/spark-submit
>>> >>> > script as
>>> >>> > shown here while passing your jar."
>>> >>> >
>>> >>> > but another says:
>>> >>> >
>>> >>> > "application-jar: Path to a bundled jar including your application
>>> and
>>> >>> > all
>>> >>> > dependencies. The URL must be globally visible inside of your
>>> cluster,
>>> >>> > for
>>> >>> > instance, an hdfs:// path or a file:// path that is present on all
>>> >>> > nodes."
>>> >>> >
>>> >>> > I suppose both could be correct if you take a certain point of view.
>>> >>> >
>>> >>> > --
>>> >>> > Jeremy Lee  BCompSci(Hons)
>>> >>> >   The Unorthodox Engineers
>>> >>
>>> >>
>>>
>>
>>
>>
>> --
>> Jeremy Lee  BCompSci(Hons)
>>   The Unorthodox Engineers
>>
> -- 
> Jeremy Lee  BCompSci(Hons)
>   The Unorthodox Engineers

Reply via email to