As far as I understand, the current focus is on how to reduce namenode's
CPU time to process block reports from a lot of datanodes.
Don't we miss another issue? Doesn't the way a block report is computed
delays the master startup time. I have to make sure the master is up as
quick as possible
You bring up a good point. The creating and processing of block reports
do take a lot of resources. It affects DFS Scalability and performance
to some extent. Here are some more details:
http://issues.apache.org/jira/browse/HADOOP-1079
There is one thread in the Datanode that sends block
reports: memory vs. file system, and Dividing
offerService into 2 threads
dhruba Borthakur wrote:
My current thinking is that block report processing should compare
the
blkxxx files on disk with the data structure in the Datanode memory.
If
and only if there is some discrepancy between these two