Lost regions question

Brennon Church Thu, 11 Apr 2013 22:51:30 -0700

Hello,

I had an interesting problem come up recently. We have a few thousandregions across 8 datanode/regionservers. I made a change, increasingthe heap size for hadoop from 128M to 2048M which ended up bringing thecluster to a complete halt after about 1 hour. I reverted back to 128Mand turned things back on again but didn't realize at the time that Icame up with 9 fewer regions than I started. Upon furtherinvestigation, I found that all 9 missing regions were from splits thatoccurred while the cluster was running after making the heap change andbefore it came to a halt. There was a 10th regions (5 splits involvedin total) that managed to get recovered. The really odd thing is thatin the case of the other 9 regions, the original parent regions, whichas far as I can tell in the logs were deleted, were re-opened uponrestarting things once again. The daughter regions were gone.Interestingly, I found the orphaned datablocks still intact, and in atleast some cases have been able to extract the data from them and willhopefully re-add it to the tables.

My question is this. Does anyone know based on the rather muddleddescription I've given above, what could have possibly happened here?My best guess is that the bad state that hdfs was in caused somecritical component of the split process to be missed, which resulted areference to the parent regions sticking around and losing thereferences to the daughter regions.


Thanks for any insight you can provide.

--Brennon

Lost regions question

Reply via email to