Re: NameNode Crashing with "flush failed for required journal" exception

2016-05-01 Thread Shaik M
Hi Chris, After installing "NSCD" service on Hadoop Cluster, NameNode is running stable without any downtime from last three days. :) Thanks you for your help. Regards, Shaik On 29 April 2016 at 11:43, Shaik M wrote: > Thank you for your suggestions. > > I found in logs > "WARN security.Gr

Re: NameNode Crashing with "flush failed for required journal" exception

2016-04-28 Thread Shaik M
Thank you for your suggestions. I found in logs "WARN security.Groups (Groups.java:fetchGroupList(244)) - Potential performance problem: getGroups(user=hdfs) took 15915 milliseconds. First I'll deploy "nscd" service on all three journal nodes and will update you accordingly. Thanks, Shaik On 2

Re: NameNode Crashing with "flush failed for required journal" exception

2016-04-28 Thread Chris Nauroth
A problem I've seen a few times is that slow lookups of the hdfs user's groups at the JournalNode introduce delays in handling the edit logging RPC, which then times out at the NameNode side, ultimately causing an abort and an HA failover. If your environment is experiencing this, then you'll see

Re: NameNode Crashing with "flush failed for required journal" exception

2016-04-28 Thread Gagan Brahmi
Hi Shaik, The error basically indicates that namenode crashed waiting for the write and sync to happen on the quorum of JournalNodes. In your case atleast 2 journal nodes should complete the write and sync without the timeout period of 20 seconds which does not seems to be the case. I will advice

NameNode Crashing with "flush failed for required journal" exception

2016-04-28 Thread Shaik M
Hi All, I am running 8 node HDP 2.3 Hadoop Cluster (3 Master+5 DataNodes) with Kerberos security. NameNode having HA and it is crashing at least once in a day with "flush failed for required journal " exception. don't have any network issues between the nodes. I have tried to find the causing t