On Thu, Feb 19, 2015 at 2:49 PM, John Omernik <j...@omernik.com> wrote:

> I am running Spark on Mesos and it works quite well.  I have three
> users, all who setup iPython notebooks to instantiate a spark instance
> to work with on the notebooks. I love it so far.
>
> Since I am "auto" instantiating (I don't want a user to have to
> "think" about instantiating and submitting a spark app to do adhoc
> analysis, I want the environment setup ahead of time) this is done
> whenever an iPython notebook is open.  So far it's working pretty
> good, save one issue:
>
> Every notebook is a new driver. I.e. every time they open a notebook,
> a new spark submit is called, and the driver resources are allocated,
> regardless if they are used or not.  Yes, it's only the driver, but
> even that I find starts slowing down my queries for the notebooks that
> using spark.  (I am running in Mesos Fined Grained mode).
>
>
> I have three users on my system, ideally, I would love to find a way
> so that on the first notebook being opened, a driver is started for
> that user, and then can be used for any notebook the user has open. So
> if they open a new notebook, I can check that yes, the user has a
> spark driver running, and thus, that notebook, if there is a query,
> will run it through that driver. That allows me to understand the
> resource allocation better, and it limits users from running 10
> notebooks and having a lot of resources.
>
> The other thing I was wondering is could the driver actually be run on
> the mesos cluster? Right now, I have a "edge" node as an iPython
> server, the drivers all exist on that server, so as I get more and
> more drivers, the box's local resources get depleted with unused
> drivers.  Obviously if I could reuse the drivers per user, on that
> box, that is great first step, but if I could reuse drivers, and run
> them on the cluster, that would be ideal.  looking through the docs I
> was not clear on those options. If anyone could point me in the right
> direction, I would greatly appreciate it!
>

Cluster mode support for Spark is tracked under [SPARK-5338](
https://issues.apache.org/jira/browse/SPARK-5338). I know Tim Chen is
working on it, so there will be progress soon.

iulian


>
> John
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


-- 

--
Iulian Dragos

------
Reactive Apps on the JVM
www.typesafe.com

Reply via email to