Hi Jeff, Beyond the hdfs blocks, there is something called as * InputSplit/FileSplit* (in your terms logical structure). Mapper operates on InputSplits using the RecordReader and this RecordReader is specific to InputFormat. InputFormat parses the input and generates key-value pairs.
InputFormat also handle records that may be split on the FileSplit boundary (i.e., different blocks). Please check this link for more information, http://wiki.apache.org/hadoop/HadoopMapReduce Best, Mahesh Balija, Calsoft Labs. On Mon, Dec 3, 2012 at 3:33 AM, Jeff LI <uniquej...@gmail.com> wrote: > Hello, > > I was reading on the relationship between input splits and HDFS blocks and > a question came up to me: > > If a logical record crosses HDFS block boundary, let's say block#1 and > block#2, does the mapper assigned with this input split asks for (1) both > blocks, or (2) block#1 and just the part of block#2 that this logical > record extends to, or (3) block#1 and part of block#2 up to some sync point > that covers this particular logical record? Note the input is sequence > file. > > I guess my question really is: does Hadoop operate on a block basis or > does it respect some sort of logical structure within a block when it's > trying to feed the mappers with input data. > > Cheers > > Jeff > >