That could be the behavior but spark.mesos.executor.home being unset still raises an exception inside the dispatcher preventing a docker from even being started. I can see if other properties are inherited from the default environment when that's set, if you'd like.
I think the main problem is just that premature validation is being done on the dispatcher and the dispatcher crashing in the event of bad config. - Alan On Sat, Sep 19, 2015 at 11:03 AM, Timothy Chen <t...@mesosphere.io> wrote: > You can still provide properties through the docker container by putting > configuration in the conf directory, but we try to pass all properties > submitted from the driver spark-submit through which I believe will > override the defaults. > > This is not what you are seeing? > > Tim > > > On Sep 19, 2015, at 9:01 AM, Alan Braithwaite <a...@cloudflare.com> wrote: > > The assumption that the executor has no default properties set in it's > environment through the docker container. Correct me if I'm wrong, but any > properties which are unset in the SparkContext will come from the > environment of the executor will it not? > > Thanks, > - Alan > > On Sat, Sep 19, 2015 at 1:09 AM, Tim Chen <t...@mesosphere.io> wrote: > >> I guess I need a bit more clarification, what kind of assumptions was the >> dispatcher making? >> >> Tim >> >> >> On Thu, Sep 17, 2015 at 10:18 PM, Alan Braithwaite <a...@cloudflare.com> >> wrote: >> >>> Hi Tim, >>> >>> Thanks for the follow up. It's not so much that I expect the executor >>> to inherit the configuration of the dispatcher as I* don't *expect the >>> dispatcher to make assumptions about the system environment of the executor >>> (since it lives in a docker). I could potentially see a case where you >>> might want to explicitly forbid the defaults, but I can't think of any >>> right now. >>> >>> Otherwise, I'm confused as to why the defaults in the docker image for >>> the executor are just ignored. I suppose that it's the dispatchers job to >>> ensure the *exact* configuration of the executor, regardless of the >>> defaults set on the executors machine? Is that the assumption being made? >>> I can understand that in contexts which aren't docker driven since jobs >>> could be rolling out in the middle of a config update. Trying to think of >>> this outside the terms of just mesos/docker (since I'm fully aware that >>> docker doesn't rule the world yet). >>> >>> So I can see this from both perspectives now and passing in the >>> properties file will probably work just fine for me, but for my better >>> understanding: When the executor starts, will it read any of the >>> environment that it's executing in or will it just take only the properties >>> given to it by the dispatcher and nothing more? >>> >>> Lemme know if anything needs more clarification and thanks for your >>> mesos contribution to spark! >>> >>> - Alan >>> >>> On Thu, Sep 17, 2015 at 5:03 PM, Timothy Chen <t...@mesosphere.io> wrote: >>> >>>> Hi Alan, >>>> >>>> If I understand correctly, you are setting executor home when you >>>> launch the dispatcher and not on the configuration when you submit job, and >>>> expect it to inherit that configuration? >>>> >>>> When I worked on the dispatcher I was assuming all configuration is >>>> passed to the dispatcher to launch the job exactly how you will need to >>>> launch it with client mode. >>>> >>>> But indeed it shouldn't crash dispatcher, I'll take a closer look when >>>> I get a chance. >>>> >>>> Can you recommend changes on the documentation, either in email or a PR? >>>> >>>> Thanks! >>>> >>>> Tim >>>> >>>> Sent from my iPhone >>>> >>>> On Sep 17, 2015, at 12:29 PM, Alan Braithwaite <a...@cloudflare.com> >>>> wrote: >>>> >>>> Hey All, >>>> >>>> To bump this thread once again, I'm having some trouble using the >>>> dispatcher as well. >>>> >>>> I'm using Mesos Cluster Manager with Docker Executors. I've deployed >>>> the dispatcher as Marathon job. When I submit a job using spark submit, >>>> the dispatcher writes back that the submission was successful and then >>>> promptly dies in marathon. Looking at the logs reveals it was hitting the >>>> following line: >>>> >>>> 398: throw new SparkException("Executor Spark home >>>> `spark.mesos.executor.home` is not set!") >>>> >>>> Which is odd because it's set in multiple places (SPARK_HOME, >>>> spark.mesos.executor.home, spark.home, etc). Reading the code, it >>>> appears that the driver desc pulls only from the request and disregards any >>>> other properties that may be configured. Testing by passing --conf >>>> spark.mesos.executor.home=/usr/local/spark on the command line to >>>> spark-submit confirms this. We're trying to isolate the number of places >>>> where we have to set properties within spark and were hoping that it will >>>> be possible to have this pull in the spark-defaults.conf from somewhere, or >>>> at least allow the user to inform the dispatcher through spark-submit that >>>> those properties will be available once the job starts. >>>> >>>> Finally, I don't think the dispatcher should crash in this event. It >>>> seems not exceptional that a job is misconfigured when submitted. >>>> >>>> Please direct me on the right path if I'm headed in the wrong >>>> direction. Also let me know if I should open some tickets for these >>>> issues. >>>> >>>> Thanks, >>>> - Alan >>>> >>>> On Fri, Sep 11, 2015 at 1:05 PM, Tim Chen <t...@mesosphere.io> wrote: >>>> >>>>> Yes you can create an issue, or actually contribute a patch to update >>>>> it :) >>>>> >>>>> Sorry the docs is a bit light, I'm going to make it more complete >>>>> along the way. >>>>> >>>>> Tim >>>>> >>>>> >>>>> On Fri, Sep 11, 2015 at 11:11 AM, Tom Waterhouse (tomwater) < >>>>> tomwa...@cisco.com> wrote: >>>>> >>>>>> Tim, >>>>>> >>>>>> Thank you for the explanation. You are correct, my Mesos experience >>>>>> is very light, and I haven’t deployed anything via Marathon yet. What >>>>>> you >>>>>> have stated here makes sense, I will look into doing this. >>>>>> >>>>>> Adding this info to the docs would be great. Is the appropriate >>>>>> action to create an issue regarding improvement of the docs? For those >>>>>> of >>>>>> us who are gaining the experience having such a pointer is very helpful. >>>>>> >>>>>> Tom >>>>>> >>>>>> From: Tim Chen <t...@mesosphere.io> >>>>>> Date: Thursday, September 10, 2015 at 10:25 AM >>>>>> To: Tom Waterhouse <tomwa...@cisco.com> >>>>>> Cc: "user@spark.apache.org" <user@spark.apache.org> >>>>>> Subject: Re: Spark on Mesos with Jobs in Cluster Mode Documentation >>>>>> >>>>>> Hi Tom, >>>>>> >>>>>> Sorry the documentation isn't really rich, since it's probably >>>>>> assuming users understands how Mesos and framework works. >>>>>> >>>>>> First I need explain the rationale of why create the dispatcher. If >>>>>> you're not familiar with Mesos yet, each node in your datacenter is >>>>>> installed a Mesos slave where it's responsible for publishing resources >>>>>> and >>>>>> running/watching tasks, and Mesos master is responsible for taking the >>>>>> aggregated resources and scheduling them among frameworks. >>>>>> >>>>>> Frameworks are not managed by Mesos, as Mesos master/slave doesn't >>>>>> launch and maintain framework but assume they're launched and kept >>>>>> running >>>>>> on its own. All the existing frameworks in the ecosystem therefore all >>>>>> have >>>>>> their own ways to deploy, HA and persist state (e.g: Aurora, Marathon, >>>>>> etc). >>>>>> >>>>>> Therefore, to introduce cluster mode with Mesos, we must create a >>>>>> framework that is long running that can be running in your datacenter, >>>>>> and >>>>>> can handle launching spark drivers on demand and handle HA, etc. This is >>>>>> what the dispatcher is all about. >>>>>> >>>>>> So the idea is that you should launch the dispatcher not on the >>>>>> client, but on a machine in your datacenter. In Mesosphere's DCOS we >>>>>> launch >>>>>> all frameworks and long running services with Marathon, and you can use >>>>>> Marathon to launch the Spark dispatcher. >>>>>> >>>>>> Then all clients instead of specifying the Mesos master URL (e.g: >>>>>> mesos://mesos.master:2181), then just talks to the dispatcher only >>>>>> (mesos://spark-dispatcher.mesos:7077), and the dispatcher will then start >>>>>> and watch the driver for you. >>>>>> >>>>>> Tim >>>>>> >>>>>> >>>>>> >>>>>> On Thu, Sep 10, 2015 at 10:13 AM, Tom Waterhouse (tomwater) < >>>>>> tomwa...@cisco.com> wrote: >>>>>> >>>>>>> After spending most of yesterday scouring the Internet for sources >>>>>>> of documentation for submitting Spark jobs in cluster mode to a Spark >>>>>>> cluster managed by Mesos I was able to do just that, but I am not >>>>>>> convinced >>>>>>> that how I have things setup is correct. >>>>>>> >>>>>>> I used the Mesos published >>>>>>> <https://open.mesosphere.com/getting-started/datacenter/install/> >>>>>>> instructions for setting up my Mesos cluster. I have three Zookeeper >>>>>>> instances, three Mesos master instances, and three Mesos slave >>>>>>> instances. >>>>>>> This is all running in Openstack. >>>>>>> >>>>>>> The documentation on the Spark documentation site states that “To >>>>>>> use cluster mode, you must start the MesosClusterDispatcher in your >>>>>>> cluster >>>>>>> via the sbin/start-mesos-dispatcher.sh script, passing in the Mesos >>>>>>> master url (e.g: mesos://host:5050).” That is it, no more >>>>>>> information than that. So that is what I did: I have one machine that I >>>>>>> use as the Spark client for submitting jobs. I started the Mesos >>>>>>> dispatcher with script as described, and using the client machine’s IP >>>>>>> address and port as the target for the job submitted the job. >>>>>>> >>>>>>> The job is currently running in Mesos as expected. This is not >>>>>>> however how I would have expected to configure the system. As running >>>>>>> there is one instance of the Spark Mesos dispatcher running outside of >>>>>>> Mesos, so not a part of the sphere of Mesos resource management. >>>>>>> >>>>>>> I used the following Stack Overflow posts as guidelines: >>>>>>> http://stackoverflow.com/questions/31164725/spark-mesos-dispatcher >>>>>>> http://stackoverflow.com/questions/31294515/start-spark-via-mesos >>>>>>> >>>>>>> There must be better documentation on how to deploy Spark in Mesos >>>>>>> with jobs able to be deployed in cluster mode. >>>>>>> >>>>>>> I can follow up with more specific information regarding my >>>>>>> deployment if necessary. >>>>>>> >>>>>>> Tom >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >> >