How do you know where the data exists when you begin?

Sent from a remote device. Please excuse any typos...

Mike Segel

> On Oct 28, 2013, at 8:57 PM, "ricky lee" <rickylee0...@gmail.com> wrote:
> 
> Hi,
> 
> I have a question about maintaining data locality in a MapReduce job launched 
> through Yarn. Based on the Yarn tutorial, it seems like an application master 
> can specify resource name, memory, and cpu when requesting containers. By 
> carefully choosing resource names, I think the data locality can be achieved. 
> I am curious how the current MapReduce application master is doing this. Does 
> it check all needed blocks for a job and choose subset of nodes with the most 
> needed blocks? If someone can point me source code snippets that make this 
> decision, it would be very much appreciated. thx.
> 
> -r

Reply via email to