What does the logs say ? ᐧ On Sat, Dec 19, 2015 at 10:08 PM, Marcin Tustin <mtus...@handybook.com> wrote:
> Hi All, > > We have just switched over to HA namenodes with ZK failover, using > HDP-2.3.0.0-2557 > (HDFS 2.7.1.2.3). I'm looking for suggestions as to what to investigate to > make this more stable. > > Before we went to HA our namenode was reasonably stable. Now, the > namenodes are crashing multiple times a day, and frequently failing to fail > over correctly; to the point where I can't even use haadmin > -transitionToActive to force a failover. I find that instead I have to > restart the namenodes. > > We're running them on AWS instances with 31.01GB and 8 cores. In addition > to the namenode, we host a journalnode, a zkfailovercontroller, and the > ambari metrics collector on the same machine. (The third journalnode lives > with the yarn resource manager). > > Right now the namenodes are configured with a maximum heap of 25 GB. > > Does that sound credible? What else should we be paying attention to to > make HDFS stable again? > > With thanks, > Marcin > > > Want to work at Handy? Check out our culture deck and open roles > <http://www.handy.com/careers> > Latest news <http://www.handy.com/press> at Handy > Handy just raised $50m > <http://venturebeat.com/2015/11/02/on-demand-home-service-handy-raises-50m-in-round-led-by-fidelity/> > led > by Fidelity > > -- * Regards* * Sandeep Nemuri*