Hi Костарев,
  I think it should work for YARN even YARN doesn't support layer above rack 
(actually I am working on supporting more layers topology for YARN at YARN-18) 
now. 
  Current YARN should just recognize your topology as three racks: "dc1/rack1", 
"dc2/rack1", "dc2/rack2". Each node (NM) with free resources should be assigned 
with containers in heartbeat with RM no matter what locality level there. The 
only exception case should be: 1. no pending resource requests 2. NM capacity 
is too small to meet resource request 3. delay scheduling is enabled and no 
data-local attempt. In your case, I don't see anything stop task assignment on 
a1 and a2. Anyone here can correct me if any misunderstanding here. :)
  Anyway, I will give it a try (as your configuration) later to see if some 
bugs in boundary cases there or it could be some misconfiguration. Which minor 
version (2.0.x or trunk) you are using now?

Thanks,

Junping

----- Original Message -----
From: "Костарев А.Ф." <[email protected]>
To: [email protected]
Sent: Tuesday, July 9, 2013 5:48:49 PM
Subject: Algorithm of distribution Map and Reduce tasks at various topology of 
a network

Hi
I have claster in two datacenters

           CLUSTER
              |
     +--------+---------+
     |                  |
datacenter1        datacenter2
     |                  |
   rack1               rack1
       |                |  |
       +-a1             |  +-b1
       |                |  |
       +-a2             |  +-b3
                        |
                       rack2
                           +-b3


Cluster have file with repcica coefficient=5
All files's blocks resides on all servers of cluser.

When I work with standart MapReduce (MRv1) (called on b1) Map and 
Rediuce task runs on all servers b1, b2, b3, a1, a2
When I work with YARN (MRv2) (called on b1) Map and Reduce task runs 
only on b1, b2, b3

Can I run in YARN Map tasks on all servers?


-- 
Консультант 1-й категории
Костарев А.Ф

Reply via email to