Re: Spark on Mesos with Jobs in Cluster Mode Documentation

Alan Braithwaite Mon, 21 Sep 2015 11:25:56 -0700

That could be the behavior but spark.mesos.executor.home being unset still
raises an exception inside the dispatcher preventing a docker from even
being started.  I can see if other properties are inherited from the
default environment when that's set, if you'd like.


I think the main problem is just that premature validation is being done on
the dispatcher and the dispatcher crashing in the event of bad config.

- Alan

On Sat, Sep 19, 2015 at 11:03 AM, Timothy Chen <t...@mesosphere.io> wrote:

> You can still provide properties through the docker container by putting
> configuration in the conf directory, but we try to pass all properties
> submitted from the driver spark-submit through which I believe will
> override the defaults.
>
> This is not what you are seeing?
>
> Tim
>
>
> On Sep 19, 2015, at 9:01 AM, Alan Braithwaite <a...@cloudflare.com> wrote:
>
> The assumption that the executor has no default properties set in it's
> environment through the docker container.  Correct me if I'm wrong, but any
> properties which are unset in the SparkContext will come from the
> environment of the executor will it not?
>
> Thanks,
> - Alan
>
> On Sat, Sep 19, 2015 at 1:09 AM, Tim Chen <t...@mesosphere.io> wrote:
>
>> I guess I need a bit more clarification, what kind of assumptions was the
>> dispatcher making?
>>
>> Tim
>>
>>
>> On Thu, Sep 17, 2015 at 10:18 PM, Alan Braithwaite <a...@cloudflare.com>
>> wrote:
>>
>>> Hi Tim,
>>>
>>> Thanks for the follow up.  It's not so much that I expect the executor
>>> to inherit the configuration of the dispatcher as I* don't *expect the
>>> dispatcher to make assumptions about the system environment of the executor
>>> (since it lives in a docker).  I could potentially see a case where you
>>> might want to explicitly forbid the defaults, but I can't think of any
>>> right now.
>>>
>>> Otherwise, I'm confused as to why the defaults in the docker image for
>>> the executor are just ignored.  I suppose that it's the dispatchers job to
>>> ensure the *exact* configuration of the executor, regardless of the
>>> defaults set on the executors machine?  Is that the assumption being made?
>>> I can understand that in contexts which aren't docker driven since jobs
>>> could be rolling out in the middle of a config update.  Trying to think of
>>> this outside the terms of just mesos/docker (since I'm fully aware that
>>> docker doesn't rule the world yet).
>>>
>>> So I can see this from both perspectives now and passing in the
>>> properties file will probably work just fine for me, but for my better
>>> understanding: When the executor starts, will it read any of the
>>> environment that it's executing in or will it just take only the properties
>>> given to it by the dispatcher and nothing more?
>>>
>>> Lemme know if anything needs more clarification and thanks for your
>>> mesos contribution to spark!
>>>
>>> - Alan
>>>
>>> On Thu, Sep 17, 2015 at 5:03 PM, Timothy Chen <t...@mesosphere.io> wrote:
>>>
>>>> Hi Alan,
>>>>
>>>> If I understand correctly, you are setting executor home when you
>>>> launch the dispatcher and not on the configuration when you submit job, and
>>>> expect it to inherit that configuration?
>>>>
>>>> When I worked on the dispatcher I was assuming all configuration is
>>>> passed to the dispatcher to launch the job exactly how you will need to
>>>> launch it with client mode.
>>>>
>>>> But indeed it shouldn't crash dispatcher, I'll take a closer look when
>>>> I get a chance.
>>>>
>>>> Can you recommend changes on the documentation, either in email or a PR?
>>>>
>>>> Thanks!
>>>>
>>>> Tim
>>>>
>>>> Sent from my iPhone
>>>>
>>>> On Sep 17, 2015, at 12:29 PM, Alan Braithwaite <a...@cloudflare.com>
>>>> wrote:
>>>>
>>>> Hey All,
>>>>
>>>> To bump this thread once again, I'm having some trouble using the
>>>> dispatcher as well.
>>>>
>>>> I'm using Mesos Cluster Manager with Docker Executors.  I've deployed
>>>> the dispatcher as Marathon job.  When I submit a job using spark submit,
>>>> the dispatcher writes back that the submission was successful and then
>>>> promptly dies in marathon.  Looking at the logs reveals it was hitting the
>>>> following line:
>>>>
>>>> 398:          throw new SparkException("Executor Spark home
>>>> `spark.mesos.executor.home` is not set!")
>>>>
>>>> Which is odd because it's set in multiple places (SPARK_HOME,
>>>> spark.mesos.executor.home, spark.home, etc).  Reading the code, it
>>>> appears that the driver desc pulls only from the request and disregards any
>>>> other properties that may be configured.  Testing by passing --conf
>>>> spark.mesos.executor.home=/usr/local/spark on the command line to
>>>> spark-submit confirms this.  We're trying to isolate the number of places
>>>> where we have to set properties within spark and were hoping that it will
>>>> be possible to have this pull in the spark-defaults.conf from somewhere, or
>>>> at least allow the user to inform the dispatcher through spark-submit that
>>>> those properties will be available once the job starts.
>>>>
>>>> Finally, I don't think the dispatcher should crash in this event.  It
>>>> seems not exceptional that a job is misconfigured when submitted.
>>>>
>>>> Please direct me on the right path if I'm headed in the wrong
>>>> direction.  Also let me know if I should open some tickets for these 
>>>> issues.
>>>>
>>>> Thanks,
>>>> - Alan
>>>>
>>>> On Fri, Sep 11, 2015 at 1:05 PM, Tim Chen <t...@mesosphere.io> wrote:
>>>>
>>>>> Yes you can create an issue, or actually contribute a patch to update
>>>>> it :)
>>>>>
>>>>> Sorry the docs is a bit light, I'm going to make it more complete
>>>>> along the way.
>>>>>
>>>>> Tim
>>>>>
>>>>>
>>>>> On Fri, Sep 11, 2015 at 11:11 AM, Tom Waterhouse (tomwater) <
>>>>> tomwa...@cisco.com> wrote:
>>>>>
>>>>>> Tim,
>>>>>>
>>>>>> Thank you for the explanation.  You are correct, my Mesos experience
>>>>>> is very light, and I haven’t deployed anything via Marathon yet.  What 
>>>>>> you
>>>>>> have stated here makes sense, I will look into doing this.
>>>>>>
>>>>>> Adding this info to the docs would be great.  Is the appropriate
>>>>>> action to create an issue regarding improvement of the docs?  For those 
>>>>>> of
>>>>>> us who are gaining the experience having such a pointer is very helpful.
>>>>>>
>>>>>> Tom
>>>>>>
>>>>>> From: Tim Chen <t...@mesosphere.io>
>>>>>> Date: Thursday, September 10, 2015 at 10:25 AM
>>>>>> To: Tom Waterhouse <tomwa...@cisco.com>
>>>>>> Cc: "user@spark.apache.org" <user@spark.apache.org>
>>>>>> Subject: Re: Spark on Mesos with Jobs in Cluster Mode Documentation
>>>>>>
>>>>>> Hi Tom,
>>>>>>
>>>>>> Sorry the documentation isn't really rich, since it's probably
>>>>>> assuming users understands how Mesos and framework works.
>>>>>>
>>>>>> First I need explain the rationale of why create the dispatcher. If
>>>>>> you're not familiar with Mesos yet, each node in your datacenter is
>>>>>> installed a Mesos slave where it's responsible for publishing resources 
>>>>>> and
>>>>>> running/watching tasks, and Mesos master is responsible for taking the
>>>>>> aggregated resources and scheduling them among frameworks.
>>>>>>
>>>>>> Frameworks are not managed by Mesos, as Mesos master/slave doesn't
>>>>>> launch and maintain framework but assume they're launched and kept 
>>>>>> running
>>>>>> on its own. All the existing frameworks in the ecosystem therefore all 
>>>>>> have
>>>>>> their own ways to deploy, HA and persist state (e.g: Aurora, Marathon, 
>>>>>> etc).
>>>>>>
>>>>>> Therefore, to introduce cluster mode with Mesos, we must create a
>>>>>> framework that is long running that can be running in your datacenter, 
>>>>>> and
>>>>>> can handle launching spark drivers on demand and handle HA, etc. This is
>>>>>> what the dispatcher is all about.
>>>>>>
>>>>>> So the idea is that you should launch the dispatcher not on the
>>>>>> client, but on a machine in your datacenter. In Mesosphere's DCOS we 
>>>>>> launch
>>>>>> all frameworks and long running services with Marathon, and you can use
>>>>>> Marathon to launch the Spark dispatcher.
>>>>>>
>>>>>> Then all clients instead of specifying the Mesos master URL (e.g:
>>>>>> mesos://mesos.master:2181), then just talks to the dispatcher only
>>>>>> (mesos://spark-dispatcher.mesos:7077), and the dispatcher will then start
>>>>>> and watch the driver for you.
>>>>>>
>>>>>> Tim
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Sep 10, 2015 at 10:13 AM, Tom Waterhouse (tomwater) <
>>>>>> tomwa...@cisco.com> wrote:
>>>>>>
>>>>>>> After spending most of yesterday scouring the Internet for sources
>>>>>>> of documentation for submitting Spark jobs in cluster mode to a Spark
>>>>>>> cluster managed by Mesos I was able to do just that, but I am not 
>>>>>>> convinced
>>>>>>> that how I have things setup is correct.
>>>>>>>
>>>>>>> I used the Mesos published
>>>>>>> <https://open.mesosphere.com/getting-started/datacenter/install/>
>>>>>>> instructions for setting up my Mesos cluster.  I have three Zookeeper
>>>>>>> instances, three Mesos master instances, and three Mesos slave 
>>>>>>> instances.
>>>>>>> This is all running in Openstack.
>>>>>>>
>>>>>>> The documentation on the Spark documentation site states that “To
>>>>>>> use cluster mode, you must start the MesosClusterDispatcher in your 
>>>>>>> cluster
>>>>>>> via the sbin/start-mesos-dispatcher.sh script, passing in the Mesos
>>>>>>> master url (e.g: mesos://host:5050).”  That is it, no more
>>>>>>> information than that.  So that is what I did: I have one machine that I
>>>>>>> use as the Spark client for submitting jobs.  I started the Mesos
>>>>>>> dispatcher with script as described, and using the client machine’s IP
>>>>>>> address and port as the target for the job submitted the job.
>>>>>>>
>>>>>>> The job is currently running in Mesos as expected.  This is not
>>>>>>> however how I would have expected to configure the system.  As running
>>>>>>> there is one instance of the Spark Mesos dispatcher running outside of
>>>>>>> Mesos, so not a part of the sphere of Mesos resource management.
>>>>>>>
>>>>>>> I used the following Stack Overflow posts as guidelines:
>>>>>>> http://stackoverflow.com/questions/31164725/spark-mesos-dispatcher
>>>>>>> http://stackoverflow.com/questions/31294515/start-spark-via-mesos
>>>>>>>
>>>>>>> There must be better documentation on how to deploy Spark in Mesos
>>>>>>> with jobs able to be deployed in cluster mode.
>>>>>>>
>>>>>>> I can follow up with more specific information regarding my
>>>>>>> deployment if necessary.
>>>>>>>
>>>>>>> Tom
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Spark on Mesos with Jobs in Cluster Mode Documentation

Reply via email to