Hi Jeremy, Thanks for the reply. We got Spark on our setup after a similar script was brought up to work with LSF. Really appreciate your help. Will keep in touch on Twitter Thanks,@sidkashyap :)
From: freeman.jer...@gmail.com Subject: Re: Spark on an HPC setup Date: Thu, 29 May 2014 00:37:54 -0400 To: user@spark.apache.org Hi Sid, We are successfully running Spark on an HPC, it works great. Here's info on our setup / approach. We have a cluster with 256 nodes running Scientific Linux 6.3 and scheduled by Univa Grid Engine. The environment also has a DDN GridScalar running GPFS and several EMC Isilon clusters serving NFS to the compute cluster. We wrote a custom qsub job to spin up Spark dynamically on a user-designated quantity of nodes. The UGE scheduler first designates a set of nodes that will be used to run Spark. Once the nodes are available, we use start-master.sh script to launch a master, and send it the addresses of the other nodes. The master then starts the workers with start-all.sh. At that point, the Spark cluster is usable and remains active until the user issues a qdel, which triggers the stop-all.sh on the master, and takes down the cluster. This worked well for us because users can pick the number of nodes to suit their job, and multiple users can run their own Spark clusters on the same system (alongside other non-Spark jobs). We don't use HDFS for the filesystem, instead relying on NFS and GPFS, and the cluster is not running Hadoop. In tests, we've seen similar performance between our set up, and using Spark w/ HDFS on EC2 with higher-end instances (matched roughly for memory and number of cores). Unfortunately we can't open source the launched scripts because they contain proprietary UGE stuff, but happy to try and answer any follow-up questions. -- Jeremy --------------------- Jeremy Freeman, PhD Neuroscientist @thefreemanlab On May 28, 2014, at 11:02 AM, Sidharth Kashyap <sidharth.n.kash...@outlook.com> wrote:Hi, Has anyone tried to get Spark working on an HPC setup?If yes, can you please share your learnings and how you went about doing it? An HPC setup typically comes bundled with dynamically allocated cluster and a very efficient scheduler. Configuring Spark standalone in this mode of operation is challenging as the Hadoop dependencies need to be eliminated and the cluster needs to be configured on the fly. Thanks,Sid