HDFS HA Namenodes crash all the time

Marcin Tustin Sat, 19 Dec 2015 08:39:57 -0800

Hi All,

We have just switched over to HA namenodes with ZK failover, using
HDP-2.3.0.0-2557
(HDFS 2.7.1.2.3). I'm looking for suggestions as to what to investigate to
make this more stable.


Before we went to HA our namenode was reasonably stable. Now, the namenodes
are crashing multiple times a day, and frequently failing to fail over
correctly; to the point where I can't even use haadmin -transitionToActive
to force a failover. I find that instead I have to restart the namenodes.

We're running them on AWS instances with 31.01GB and 8 cores. In addition
to the namenode, we host a journalnode, a zkfailovercontroller, and the
ambari metrics collector on the same machine. (The third journalnode lives
with the yarn resource manager).

Right now the namenodes are configured with a maximum heap of 25 GB.

Does that sound credible? What else should we be paying attention to to
make HDFS stable again?

With thanks,
Marcin

-- 
Want to work at Handy? Check out our culture deck and open roles 
<http://www.handy.com/careers>
Latest news <http://www.handy.com/press> at Handy
Handy just raised $50m 
<http://venturebeat.com/2015/11/02/on-demand-home-service-handy-raises-50m-in-round-led-by-fidelity/>
 led 
by Fidelity

HDFS HA Namenodes crash all the time

Reply via email to