I think this is how it works, So RDDs will have partitions which are made
up of blocks and the blockManager will know where these blocks are
available, based on the availability (PROCESS_LOCAL, NODE_LOCAL etc), spark
will launch the tasks on those nodes. This behaviour can be controlled with
spark.locality.wait
On 22 Jun 2015 19:21, "ayan guha" <guha.a...@gmail.com> wrote:

I have a basic qs: how spark assigns partition to an executor? Does it
respect data locality? Does this behaviour depend on cluster manager, ie
yarn vs standalone?
On 22 Jun 2015 22:45, "Akhil Das" <ak...@sigmoidanalytics.com> wrote:

> Option 1 should be fine, Option 2 would bound a lot on network as the data
> increase in time.
>
> Thanks
> Best Regards
>
> On Mon, Jun 22, 2015 at 5:59 PM, Ashish Soni <asoni.le...@gmail.com>
> wrote:
>
>> Hi All  ,
>>
>> What is the Best Way to install and Spark Cluster along side with Hadoop
>> Cluster , Any recommendation for below deployment topology will be a great
>> help
>>
>> *Also Is it necessary to put the Spark Worker on DataNodes as when it
>> read block from HDFS it will be local to the Server / Worker or  I can put
>> the Worker on any other nodes and if i do that will it affect the
>> performance of the Spark Data Processing ..*
>>
>> Hadoop Option 1
>>
>> Server 1 - NameNode   & Spark Master
>> Server 2 - DataNode 1  & Spark Worker
>> Server 3 - DataNode 2  & Spark Worker
>> Server 4 - DataNode 3  & Spark Worker
>>
>> Hadoop Option 2
>>
>>
>> Server 1 - NameNode
>> Server 2 - Spark Master
>> Server 2 - DataNode 1
>> Server 3 - DataNode 2
>> Server 4 - DataNode 3
>> Server 5 - Spark Worker 1
>> Server 6 - Spark Worker 2
>> Server 7 - Spark Worker 3
>>
>> Thanks.
>>
>>
>>
>>
>

Reply via email to