Re: Purpose of spark-submit?

Koert Kuipers Wed, 09 Jul 2014 09:16:21 -0700

sandy, that makes sense. however i had trouble doing programmatic execution
on yarn in client mode as well. the application-master in yarn came up but
then bombed because it was looking for jars that dont exist (it was looking
in the original file paths on the driver side, which are not available on
the yarn node). my guess is that spark-submit is changing some settings
(perhaps preparing the distributed cache and modifying settings
accordingly), which makes it harder to run things programmatically. i could
be wrong however. i gave up debugging and resorted to using spark-submit
for now.




On Wed, Jul 9, 2014 at 12:05 PM, Sandy Ryza <sandy.r...@cloudera.com> wrote:

> Spark still supports the ability to submit jobs programmatically without
> shell scripts.
>
> Koert,
> The main reason that the unification can't be a part of SparkContext is
> that YARN and standalone support deploy modes where the driver runs in a
> managed process on the cluster.  In this case, the SparkContext is created
> on a remote node well after the application is launched.
>
>
> On Wed, Jul 9, 2014 at 8:34 AM, Andrei <faithlessfri...@gmail.com> wrote:
>
>> One another +1. For me it's a question of embedding. With
>> SparkConf/SparkContext I can easily create larger projects with Spark as a
>> separate service (just like MySQL and JDBC, for example). With spark-submit
>> I'm bound to Spark as a main framework that defines how my application
>> should look like. In my humble opinion, using Spark as embeddable library
>> rather than main framework and runtime is much easier.
>>
>>
>>
>>
>> On Wed, Jul 9, 2014 at 5:14 PM, Jerry Lam <chiling...@gmail.com> wrote:
>>
>>> +1 as well for being able to submit jobs programmatically without using
>>> shell script.
>>>
>>> we also experience issues of submitting jobs programmatically without
>>> using spark-submit. In fact, even in the Hadoop World, I rarely used
>>> "hadoop jar" to submit jobs in shell.
>>>
>>>
>>>
>>> On Wed, Jul 9, 2014 at 9:47 AM, Robert James <srobertja...@gmail.com>
>>> wrote:
>>>
>>>> +1 to be able to do anything via SparkConf/SparkContext.  Our app
>>>> worked fine in Spark 0.9, but, after several days of wrestling with
>>>> uber jars and spark-submit, and so far failing to get Spark 1.0
>>>> working, we'd like to go back to doing it ourself with SparkConf.
>>>>
>>>> As the previous poster said, a few scripts should be able to give us
>>>> the classpath and any other params we need, and be a lot more
>>>> transparent and debuggable.
>>>>
>>>> On 7/9/14, Surendranauth Hiraman <suren.hira...@velos.io> wrote:
>>>> > Are there any gaps beyond convenience and code/config separation in
>>>> using
>>>> > spark-submit versus SparkConf/SparkContext if you are willing to set
>>>> your
>>>> > own config?
>>>> >
>>>> > If there are any gaps, +1 on having parity within
>>>> SparkConf/SparkContext
>>>> > where possible. In my use case, we launch our jobs programmatically.
>>>> In
>>>> > theory, we could shell out to spark-submit but it's not the best
>>>> option for
>>>> > us.
>>>> >
>>>> > So far, we are only using Standalone Cluster mode, so I'm not
>>>> knowledgeable
>>>> > on the complexities of other modes, though.
>>>> >
>>>> > -Suren
>>>> >
>>>> >
>>>> >
>>>> > On Wed, Jul 9, 2014 at 8:20 AM, Koert Kuipers <ko...@tresata.com>
>>>> wrote:
>>>> >
>>>> >> not sure I understand why unifying how you submit app for different
>>>> >> platforms and dynamic configuration cannot be part of SparkConf and
>>>> >> SparkContext?
>>>> >>
>>>> >> for classpath a simple script similar to "hadoop classpath" that
>>>> shows
>>>> >> what needs to be added should be sufficient.
>>>> >>
>>>> >> on spark standalone I can launch a program just fine with just
>>>> SparkConf
>>>> >> and SparkContext. not on yarn, so the spark-launch script must be
>>>> doing a
>>>> >> few things extra there I am missing... which makes things more
>>>> difficult
>>>> >> because I am not sure its realistic to expect every application that
>>>> >> needs
>>>> >> to run something on spark to be launched using spark-submit.
>>>> >>  On Jul 9, 2014 3:45 AM, "Patrick Wendell" <pwend...@gmail.com>
>>>> wrote:
>>>> >>
>>>> >>> It fulfills a few different functions. The main one is giving users
>>>> a
>>>> >>> way to inject Spark as a runtime dependency separately from their
>>>> >>> program and make sure they get exactly the right version of Spark.
>>>> So
>>>> >>> a user can bundle an application and then use spark-submit to send
>>>> it
>>>> >>> to different types of clusters (or using different versions of
>>>> Spark).
>>>> >>>
>>>> >>> It also unifies the way you bundle and submit an app for Yarn,
>>>> Mesos,
>>>> >>> etc... this was something that became very fragmented over time
>>>> before
>>>> >>> this was added.
>>>> >>>
>>>> >>> Another feature is allowing users to set configuration values
>>>> >>> dynamically rather than compile them inside of their program. That's
>>>> >>> the one you mention here. You can choose to use this feature or not.
>>>> >>> If you know your configs are not going to change, then you don't
>>>> need
>>>> >>> to set them with spark-submit.
>>>> >>>
>>>> >>>
>>>> >>> On Wed, Jul 9, 2014 at 10:22 AM, Robert James <
>>>> srobertja...@gmail.com>
>>>> >>> wrote:
>>>> >>> > What is the purpose of spark-submit? Does it do anything outside
>>>> of
>>>> >>> > the standard val conf = new SparkConf ... val sc = new
>>>> SparkContext
>>>> >>> > ... ?
>>>> >>>
>>>> >>
>>>> >
>>>> >
>>>> > --
>>>> >
>>>> > SUREN HIRAMAN, VP TECHNOLOGY
>>>> > Velos
>>>> > Accelerating Machine Learning
>>>> >
>>>> > 440 NINTH AVENUE, 11TH FLOOR
>>>> > NEW YORK, NY 10001
>>>> > O: (917) 525-2466 ext. 105
>>>> > F: 646.349.4063
>>>> > E: suren.hiraman@v <suren.hira...@sociocast.com>elos.io
>>>> > W: www.velos.io
>>>> >
>>>>
>>>
>>>
>>
>

Re: Purpose of spark-submit?

Reply via email to