> What happens when MR produces data splits, and those splits don’t align
on block boundaries?
Answer depends on the file format used here. With any of the formats we
ship, nothing happens.
> but isn’t there always some slop where records straddle the block
boundaries, resulting in an extra HDFS
nt: Thursday, May 23, 2013 11:53 AM
To: user@hadoop.apache.org
Subject: HDFS data and non-aligned splits
What happens when MR produces data splits, and those splits don't align on
block boundaries? I've read that MR will attempt to make data splits near
block boundaries to improve data lo
What happens when MR produces data splits, and those splits don't align on
block boundaries? I've read that MR will attempt to make data splits near
block boundaries to improve data locality, but isn't there always some slop
where records straddle the block boundaries, resulting in an extra HDF