The sender datanode sends the crc along with the data. This allows the
receiver datanode to detect corrupt data. The orignal crc was created by the
client that created the data in the block for the first time. The crc is not
kept in the namenode. To facilitate random access, there is a crc per 512
On Sep 9, 2009, at 10:25 PM, Dhruba Borthakur wrote:
when a block is being received by a datanode (either because of a
replication request or from a client write), the datanode verifies
crc.
Ah, so I'm wrong and the answer is better than I expected. Never have
I been so happy to be wrong
when a block is being received by a datanode (either because of a
replication request or from a client write), the datanode verifies crc.
Also, the there is a thread in the datanode that periodically verifies crc
of existing blocks.
dhruba
On Wed, Sep 9, 2009 at 7:27 PM, Brian Bockelman wrote:
Hey everyone,
We're going through a review of our usage of HDFS (it's a good thing!
- we're trying to get "official"). One reviewer asked a good question
that I don't know the answer too - could you help? To quote,
"What steps do you take to ensure the block rebalancing produces non-
cor