How do you know where the data exists when you begin? Sent from a remote device. Please excuse any typos...
Mike Segel > On Oct 28, 2013, at 8:57 PM, "ricky lee" <rickylee0...@gmail.com> wrote: > > Hi, > > I have a question about maintaining data locality in a MapReduce job launched > through Yarn. Based on the Yarn tutorial, it seems like an application master > can specify resource name, memory, and cpu when requesting containers. By > carefully choosing resource names, I think the data locality can be achieved. > I am curious how the current MapReduce application master is doing this. Does > it check all needed blocks for a job and choose subset of nodes with the most > needed blocks? If someone can point me source code snippets that make this > decision, it would be very much appreciated. thx. > > -r