Splits are a MapReduce concept . Check out FileInputFormat for how an example of how to get block locations. You can then pass these locations into an AMRMClient.ContainerRequest.
-Sandy On Mon, Oct 28, 2013 at 8:10 PM, ricky l <rickylee0...@gmail.com> wrote: > Hi Sandy, thank you very much for the information. It is good to know that > MapReduce AM considers the block location information. BTW, I am not very > familiar with the concept of splits. Is it specific to MR jobs? If > possible, code location would be very helpful for reference as I am trying > to implement an application master that needs to consider HDFS > data-locality. thx. > > r. > > > On Mon, Oct 28, 2013 at 10:21 PM, Sandy Ryza <sandy.r...@cloudera.com>wrote: > >> Hi Ricky, >> >> The input splits contain the locations of the blocks they cover. The AM >> gets the information from the input splits and submits requests for those >> location. Each container request spans all the replicas that the block is >> located on. Are you interested in something more specific? >> >> -Sandy >> >> >> On Mon, Oct 28, 2013 at 7:09 PM, ricky lee <rickylee0...@gmail.com>wrote: >> >>> Well, I thought an application master can somewhat ask where the data >>> exist to a namenode.... isn't it true? If it does not know where the data >>> reside, does a MapReduce application master specify the resource name as >>> "*" which means data locality might not be preserved at all? thx, >>> >>> r >>> >> >> >