Hi all,

Is it appropriate that sending my issue to this mailing list?

I'm using hadoop v2.2 on a cluster of 16 servers.
It have about 5,000,000 files.

10 days ago, the second namenode downed because OOM, and I didnot noticed, my fault.

Yesterday I changed some configs and restarted hadoop, here is the disaster came. The namenode editlogs reached 35G(2000000 trans) because second namenode not committed checkpoints. When the namenode started, it read editlogs and replayed them, it lasted a long time (12h+) , got slower and slower, finally crashed by OOM.

Now I gave it more memory and restarted, it may take 60h+ to back online, according to the speed(1000-2000tps).

It is the speed OK?
What can I do to speed it up?
Any help will be good.

Liu Cong

Reply via email to