by using a combination of Spark's dynamic allocation,
http://spark.apache.org/docs/latest/job-scheduling.html#configuration-and-setup,
and a framework scheduler like Cook,
https://github.com/twosigma/Cook/tree/master/spark, you can achieve the
desired auto-scaling effect without the overhead of managing
roles/constraints in mesos.  i'd be happy to discuss this in more detail if
you decide to give it a try.

On Mon, Feb 27, 2017 at 3:14 AM, Ashish Mehta <mehta.ashis...@gmail.com>
wrote:

> Hi,
>
> We want to move to auto-scaling of spark driver, where in more resources
> are added into the available resources for "spark driver" based on
> requirement. The requirement can increase/decrease based on multiple SQL
> queries being done over REST server, or number of queries with multiple
> user over thrift server over Spark (HiveServer2).
>
> *Existing approach with static number of resources:*
> We have a very large pool of resources, but my "spark driver" is allocated
> limited amount of "static" resource, and we achieve this by following
>
>    1. While running the application I tag machine in Mesos with the name
>    of my application, so that the offer is made accordingly.
>    2. My application is run with the constraint for above tagged machine
>    using "spark.mesos.constraints" configuration, so that the application
>    only accept offer made by these tagged machine, and don't eat up all the
>    resource in my very large cluster.
>    3. Application launches executor on these accepted offers, and they
>    are used to do computation as defined by Spark job, or as and when queries
>    are fired over HTTP/Thrift server.
>
> *Approach for auto scaling:*
> Auto-scaling of driver helps us in many ways, and lets us use the
> resources with better efficiency.
> For enabling auto scaling, where in my spark application will get more and
> more resource offers, if it has consumed all the available resource, the
> workflow will be as follows
>
>    1. Running a daemon to monitor my app on Mesos
>    2. Keep on adding/removing machine for the application by
>    tagging/untagging them by monitoring the resource usage metric for my
>    application on Mesos.
>    3. Scale up/down based on Step 2 by tagging and untagging, and take
>    "some buffer" into account.
>
> I wanted to know the opinion of you guys on "*Approach for auto scaling*".
> Is this the right approach to solve auto scaling of Spark driver?
> Also tagging/untagging machine is something which we do to limit/manage
> the resources in our big cluster.
>
> Thanks,
> Ashish
>

Reply via email to