I have a basic qs: how spark assigns partition to an executor? Does it respect data locality? Does this behaviour depend on cluster manager, ie yarn vs standalone? On 22 Jun 2015 22:45, "Akhil Das" <ak...@sigmoidanalytics.com> wrote:
> Option 1 should be fine, Option 2 would bound a lot on network as the data > increase in time. > > Thanks > Best Regards > > On Mon, Jun 22, 2015 at 5:59 PM, Ashish Soni <asoni.le...@gmail.com> > wrote: > >> Hi All , >> >> What is the Best Way to install and Spark Cluster along side with Hadoop >> Cluster , Any recommendation for below deployment topology will be a great >> help >> >> *Also Is it necessary to put the Spark Worker on DataNodes as when it >> read block from HDFS it will be local to the Server / Worker or I can put >> the Worker on any other nodes and if i do that will it affect the >> performance of the Spark Data Processing ..* >> >> Hadoop Option 1 >> >> Server 1 - NameNode & Spark Master >> Server 2 - DataNode 1 & Spark Worker >> Server 3 - DataNode 2 & Spark Worker >> Server 4 - DataNode 3 & Spark Worker >> >> Hadoop Option 2 >> >> >> Server 1 - NameNode >> Server 2 - Spark Master >> Server 2 - DataNode 1 >> Server 3 - DataNode 2 >> Server 4 - DataNode 3 >> Server 5 - Spark Worker 1 >> Server 6 - Spark Worker 2 >> Server 7 - Spark Worker 3 >> >> Thanks. >> >> >> >> >