check zkfc logs first ; try checking the HDFS ha and zookeeper timeouts,
its better to have a dedicated disk for journal node service (similar to
zookeeper)

On Sat, Dec 19, 2015 at 9:29 AM, Sandeep Nemuri <nhsande...@gmail.com>
wrote:

> What does the logs say ?
> ᐧ
>
> On Sat, Dec 19, 2015 at 10:08 PM, Marcin Tustin <mtus...@handybook.com>
> wrote:
>
>> Hi All,
>>
>> We have just switched over to HA namenodes with ZK failover, using 
>> HDP-2.3.0.0-2557
>> (HDFS 2.7.1.2.3). I'm looking for suggestions as to what to investigate to
>> make this more stable.
>>
>> Before we went to HA our namenode was reasonably stable. Now, the
>> namenodes are crashing multiple times a day, and frequently failing to fail
>> over correctly; to the point where I can't even use haadmin
>> -transitionToActive to force a failover. I find that instead I have to
>> restart the namenodes.
>>
>> We're running them on AWS instances with 31.01GB and 8 cores. In
>> addition to the namenode, we host a journalnode, a zkfailovercontroller,
>> and the ambari metrics collector on the same machine. (The third
>> journalnode lives with the yarn resource manager).
>>
>> Right now the namenodes are configured with a maximum heap of 25 GB.
>>
>> Does that sound credible? What else should we be paying attention to to
>> make HDFS stable again?
>>
>> With thanks,
>> Marcin
>>
>>
>> Want to work at Handy? Check out our culture deck and open roles
>> <http://www.handy.com/careers>
>> Latest news <http://www.handy.com/press> at Handy
>> Handy just raised $50m
>> <http://venturebeat.com/2015/11/02/on-demand-home-service-handy-raises-50m-in-round-led-by-fidelity/>
>>  led
>> by Fidelity
>>
>>
>
>
> --
> *  Regards*
> *  Sandeep Nemuri*
>

Reply via email to