I am running Spark on Mesos and it works quite well. I have three users, all who setup iPython notebooks to instantiate a spark instance to work with on the notebooks. I love it so far.
Since I am "auto" instantiating (I don't want a user to have to "think" about instantiating and submitting a spark app to do adhoc analysis, I want the environment setup ahead of time) this is done whenever an iPython notebook is open. So far it's working pretty good, save one issue: Every notebook is a new driver. I.e. every time they open a notebook, a new spark submit is called, and the driver resources are allocated, regardless if they are used or not. Yes, it's only the driver, but even that I find starts slowing down my queries for the notebooks that using spark. (I am running in Mesos Fined Grained mode). I have three users on my system, ideally, I would love to find a way so that on the first notebook being opened, a driver is started for that user, and then can be used for any notebook the user has open. So if they open a new notebook, I can check that yes, the user has a spark driver running, and thus, that notebook, if there is a query, will run it through that driver. That allows me to understand the resource allocation better, and it limits users from running 10 notebooks and having a lot of resources. The other thing I was wondering is could the driver actually be run on the mesos cluster? Right now, I have a "edge" node as an iPython server, the drivers all exist on that server, so as I get more and more drivers, the box's local resources get depleted with unused drivers. Obviously if I could reuse the drivers per user, on that box, that is great first step, but if I could reuse drivers, and run them on the cluster, that would be ideal. looking through the docs I was not clear on those options. If anyone could point me in the right direction, I would greatly appreciate it! John --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org