Yoonmin,
Please give http://wiki.apache.org/hadoop/HadoopMapReduce a read, it
covers the read-records-across-blocks scenario.
On Thu, Oct 17, 2013 at 8:47 PM, Yoonmin Nam wrote:
> Hi.
>
> Let we consider this situation:
>
> 1. Block size = 67108864 (64MB)
>
> 2. Data size = 2.2GB… (large
The overall answer is that InputFormat implementations determine how to
split their data across block boundaries and then handle the read in order
to not have incomplete records.
When splits are generated then they typically don’t have block information.
They have an offset and length into the f
Hi.
Let we consider this situation:
1. Block size = 67108864 (64MB)
2. Data size = 2.2GB. (larger than block size)
Then, when I put the input into HDFS, I got the below list of block
replication result:
http://infolab.dgist.ac.kr/~ronymin/pictures/1.png
Then, I checked