What happens when MR produces data splits, and those splits don't align on 
block boundaries?  I've read that MR will attempt to make data splits near 
block boundaries to improve data locality, but isn't there always some slop 
where records straddle the block boundaries, resulting in an extra HDFS 
connection just to get the half-record in the other block?  Does this impact 
performance?  Are there file formats that attempt to enforce data alignment?

Reply via email to