Hi Robert, If you're running Spark against YARN, you don't need to install anything Spark-specific on all the nodes. For each application, the client will copy the Spark jar to HDFS where the Spark processes can fetch it. For faster app startup, you can copy the Spark jar to a public location on HDFS and point to it there. The YARN NodeManagers will cache it on each node after it's fetched the first time.
-Sandy On Tue, Jul 8, 2014 at 6:24 PM, Robert James <srobertja...@gmail.com> wrote: > I have a Spark app which runs well on local master. I'm now ready to > put it on a cluster. What needs to be installed on the master? What > needs to be installed on the workers? > > If the cluster already has Hadoop or YARN or Cloudera, does it still > need an install of Spark? >