Hi I have 5 Spark jobs which needs to be run in parallel to speed up process they take around 6-8 hours together. I have 93 container nodes with 8 cores each memory capacity of around 2.8 TB. Now I runs each jobs with around 30 executors with 2 cores and 20 GB each. My each jobs processes around 1 TB of data. Now since my cluster is shared cluster many other teams spawn their jobs along with me. So YARN kills my executors and not adding it back since cluster is running at max capacity. I just want to know best practices in such a resource crunching environment. These jobs runs everyday so I am looking for innovative approaches to solve this problem. Before anyone says we can have our own dedicated cluster so looking for alternative solutions. Please guide. Thanks in advance.
-- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Best-practices-for-scheduling-Spark-jobs-on-shared-YARN-cluster-using-Autosys-tp24820.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org