I was going through this link
http://stackoverflow.com/questions/9406477/data-integrity-in-hdfs-which-data-nodes-verifies-the-checksum
. Its written that in recent version of hadoop only the last data node
verifies the checksum as the write happens in a pipeline fashion.
Now I have a question:
Hi Reena,
the pipeline is per block. If you have half of your file in data node A only,
that means the pipeline had only one node (node A, in this case, probably
because replication factor is set to 1) and then, data node A has the checksums
for its block. The same applies to data node B.