by using a combination of Spark's dynamic allocation, http://spark.apache.org/docs/latest/job-scheduling.html#configuration-and-setup, and a framework scheduler like Cook, https://github.com/twosigma/Cook/tree/master/spark, you can achieve the desired auto-scaling effect without the overhead of managing roles/constraints in mesos. i'd be happy to discuss this in more detail if you decide to give it a try.
On Mon, Feb 27, 2017 at 3:14 AM, Ashish Mehta <mehta.ashis...@gmail.com> wrote: > Hi, > > We want to move to auto-scaling of spark driver, where in more resources > are added into the available resources for "spark driver" based on > requirement. The requirement can increase/decrease based on multiple SQL > queries being done over REST server, or number of queries with multiple > user over thrift server over Spark (HiveServer2). > > *Existing approach with static number of resources:* > We have a very large pool of resources, but my "spark driver" is allocated > limited amount of "static" resource, and we achieve this by following > > 1. While running the application I tag machine in Mesos with the name > of my application, so that the offer is made accordingly. > 2. My application is run with the constraint for above tagged machine > using "spark.mesos.constraints" configuration, so that the application > only accept offer made by these tagged machine, and don't eat up all the > resource in my very large cluster. > 3. Application launches executor on these accepted offers, and they > are used to do computation as defined by Spark job, or as and when queries > are fired over HTTP/Thrift server. > > *Approach for auto scaling:* > Auto-scaling of driver helps us in many ways, and lets us use the > resources with better efficiency. > For enabling auto scaling, where in my spark application will get more and > more resource offers, if it has consumed all the available resource, the > workflow will be as follows > > 1. Running a daemon to monitor my app on Mesos > 2. Keep on adding/removing machine for the application by > tagging/untagging them by monitoring the resource usage metric for my > application on Mesos. > 3. Scale up/down based on Step 2 by tagging and untagging, and take > "some buffer" into account. > > I wanted to know the opinion of you guys on "*Approach for auto scaling*". > Is this the right approach to solve auto scaling of Spark driver? > Also tagging/untagging machine is something which we do to limit/manage > the resources in our big cluster. > > Thanks, > Ashish >