Re: About block splitting, input split and TextInputFormat in MapReduce

2013-10-18 Thread Harsh J
Yoonmin, Please give http://wiki.apache.org/hadoop/HadoopMapReduce a read, it covers the read-records-across-blocks scenario. On Thu, Oct 17, 2013 at 8:47 PM, Yoonmin Nam wrote: > Hi. > > Let we consider this situation: > > 1. Block size = 67108864 (64MB) > > 2. Data size = 2.2GB… (large

RE: About block splitting, input split and TextInputFormat in MapReduce

2013-10-17 Thread Bikas Saha
The overall answer is that InputFormat implementations determine how to split their data across block boundaries and then handle the read in order to not have incomplete records. When splits are generated then they typically don’t have block information. They have an offset and length into the f

(Re)About block splitting, input split and TextInputFormat in MapReduce

2013-10-17 Thread Yoonmin Nam
Hi. Let we consider this situation: 1. Block size = 67108864 (64MB) 2. Data size = 2.2GB. (larger than block size) Then, when I put the input into HDFS, I got the below list of block replication result: http://infolab.dgist.ac.kr/~ronymin/pictures/1.png Then, I checked