If your cluster is a dedicated spark cluster (only running spark job, no other jobs like hive/pig/mr), then spark standalone would be fine. Otherwise I think yarn would be a better option.
On Fri, Nov 27, 2015 at 3:36 PM, cs user <acldstk...@gmail.com> wrote: > Hi All, > > Apologies if this question has been asked before. I'd like to know if > there are any downsides to running spark over yarn with the --master > yarn-cluster option vs having a separate spark standalone cluster to > execute jobs? > > We're looking at installing a hdfs/hadoop cluster with Ambari and > submitting jobs to the cluster using yarn, or having an Ambari cluster and > a separate standalone spark cluster, which will run the spark jobs on data > within hdfs. > > With yarn, will we still get all the benefits of spark? > > Will it be possible to process streaming data? > > Many thanks in advance for any responses. > > Cheers! > -- Best Regards Jeff Zhang