You can run multiple Spark applications simultaneously. Just limit the # of 
cores and memory allocated to each application. For example, if each node has 8 
cores and there are 10 nodes and you want to be able to run 4 applications 
simultaneously, limit the # of cores for each application to 20. Similarly, you 
can limit the amount of memory that an application can use on each node.

You can also use dynamic resource allocation.
Details are here: 
http://spark.apache.org/docs/latest/job-scheduling.html#dynamic-resource-allocation

Mohammed
Author: Big Data Analytics with 
Spark<http://www.amazon.com/Big-Data-Analytics-Spark-Practitioners/dp/1484209656/>

From: Tobias Eriksson [mailto:tobias.eriks...@qvantel.com]
Sent: Tuesday, May 3, 2016 7:34 AM
To: user@spark.apache.org
Subject: Multiple Spark Applications that use Cassandra, how to share 
resources/nodes

Hi
 We are using Spark for a long running job, in fact it is a REST-server that 
does some joins with some tables in Casandra and returns the result.
Now we need to have multiple applications running in the same Spark cluster, 
and from what I understand this is not possible, or should I say somewhat 
complicated

  1.  A Spark application takes all the resources / nodes in the cluster (we 
have 4 nodes one for each Cassandra Node)
  2.  A Spark application returns it’s resources when it is done (exits or the 
context is closed/returned)
  3.  Sharing resources using Mesos only allows scaling down and then scaling 
up by a step-by-step policy, i.e. 2 nodes, 3 nodes, 4 nodes, … And increases as 
the need increases
But if this is true, I can not have several applications running in parallell, 
is that true ?
If I use Mesos then the whole idea with one Spark Worker per Cassandra Node 
fails, as it talks directly to a node, and that is how it is so efficient.
In this case I need all nodes, not 3 out of 4.

Any mistakes in my thinking ?
Any ideas on how to solve this ? Should be a common problem I think

-Tobias


Reply via email to