Re: HDFS data and non-aligned splits

2013-05-23 Thread Harsh J
> What happens when MR produces data splits, and those splits don’t align on block boundaries? Answer depends on the file format used here. With any of the formats we ship, nothing happens. > but isn’t there always some slop where records straddle the block boundaries, resulting in an extra HDFS

RE: HDFS data and non-aligned splits

2013-05-23 Thread John Lilley
nt: Thursday, May 23, 2013 11:53 AM To: user@hadoop.apache.org Subject: HDFS data and non-aligned splits What happens when MR produces data splits, and those splits don't align on block boundaries? I've read that MR will attempt to make data splits near block boundaries to improve data lo

HDFS data and non-aligned splits

2013-05-23 Thread John Lilley
What happens when MR produces data splits, and those splits don't align on block boundaries? I've read that MR will attempt to make data splits near block boundaries to improve data locality, but isn't there always some slop where records straddle the block boundaries, resulting in an extra HDF