Hi John, I think there are limitations with the way drivers are designed that is required a seperate JVM process per driver, therefore it's not possible without any code and design change AFAIK.
A driver shouldn't stay open past your job life time though, so while not sharing between apps it shouldn't be wasting as much as you described. Tim > On Feb 27, 2015, at 7:50 AM, John Omernik <j...@omernik.com> wrote: > > All - I've asked this question before, and probably due to my own poor > comprehension or my clumsy way I ask the question, I am still unclear on the > answer. I'll try again this time using crude visual aids. > > I am using iPython Notebooks with Jupyter Hub. (Multi-User notebook server). > To make an environment really smooth for data exploration, I am creating a > spark context every time a notebook is opened. (See image below) > > This can cause issues on my "analysis" (Jupyter Hub) server as say the driver > uses 1024MB, each notebook, regardless of how much spark is used, opens up a > driver. Yes, I should probably set it up to only create the context on > demand, however, that will cause additional delay. Another issue is once they > are created, they are not closed until the notebook is halted. Users could > leave notebook kernels running causing additional wasted resources. > > > > <Current.png> > > > What I would like to do is share context per user. Basically, each user on > the system would only get one Spark context. And all adhoc queries or work > would be sent through one driver. This makes sense to me, as users will > often want Spark adhoc capabilities, and this allows them to sit open, ready > for adhoc work, while at the same time, not be over the top in resource > usage, especially if a kernel is left open. > > <Shared.png> > > > On the mesos list I was made aware of SPARK-5338 which Tim Chen is working > on. Based on conversations with him, this wouldn't actually completely > achieve what I am looking for. in that each notebook would likely still start > a spark context, but at least in this case, the spark driver would be > residing on the cluster, and thereby be resource managed by the cluster. One > thing to note here, if the deisgn is similar to the YARN cluster design, then > my iPython stuff may not work at all with Tim's approach, in that the shells > (if I am remember correctly) don't work in cluster mode on Yarn. > > <SPARK-5338.png> > > > > Barring that though, (the pyshell not working in cluster mode), I was > thinking drivers could be shared per user like I initially proposed, ran on > the cluster as Tim proposed, and the shells still work in cluster mode, that > would be ideal. We'd have everything running on the cluster, and we wouldn't > have wasted drivers or left open drivers utilizing resources. > > <Shared-SPARK5338.png> > > > > > So I guess, ideally, what keeps us from > > A. (in Yarn Cluster mode) using the driver in the cluster > B. Sharing drivers > > My guess is I may be missing something fundamental here in how Spark is > supposed to work, but I see this as a more efficient use of resources for > this type of work. I may also looking into creating some docker containers > and see how those work, but ideally I'd like to understand this at a base > level... i.e. why can't cluster (Yarn and Mesos) contexts be connected to > like a Spark stand alone cluster context can? > > Thanks! > > > John > > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org