This blog post on YDN website
http://developer.yahoo.com/blogs/hadoop/posts/2009/08/the_anatomy_of_hadoop_io_pipel/has
detailed discussion on different steps involved in Hadoop IO
operations
and opportunities for optimizations. Could someone please comment on current
state of these potential
Hi Shrinivas,
There has been some work going on recently around optimizing checksums. See
HDFS-2080 for example. This will help both the write and read code, though
we've focused more on read.
There have also been a lot of improvements around random read access - for
example HDFS-941 which