George has answered most of these. I'll just add on: On Tue, Sep 11, 2012 at 12:44 PM, Mehul Choube <mehul_cho...@symantec.com> wrote: > 1. Some of the blocks it was managing are deleted/modified?
A DN runs a block report upon start, and sends the list of blocks to the NN. NN validates them and if it finds any files to miss block replicas post-report, it will schedule a re-replication from one of the good DNs that still carry it. The modified (out-of-HDFS) blocks fail their stored checksums so are treated as corrupt and deleted, and are re-replicated in the same manner. > 2. The size of the blocks are now modified say from 64MB to 128MB? George's got this already. Changing of block size does not impact any existing blocks. It is a per-file metadata prop. > 3. What if the block replication factor was one (yea not in most > deployments but say incase) so does the namenode recreate a file once the > datanode rejoins? Files exist at the NN metadata (its fsimage/edits persist this). Blocks pertaining to a file exists at a DN. If the file had a single replica and that replica was lost, then the file's data is lost and the NameNode will tell you as much in its metrics/fsck. -- Harsh J