Hi Chris,
After installing "NSCD" service on Hadoop Cluster, NameNode is running
stable without any downtime from last three days. :)
Thanks you for your help.
Regards,
Shaik
On 29 April 2016 at 11:43, Shaik M wrote:
> Thank you for your suggestions.
>
> I found in logs
> "WARN security.Gr
Thank you for your suggestions.
I found in logs
"WARN security.Groups (Groups.java:fetchGroupList(244)) - Potential
performance problem: getGroups(user=hdfs) took 15915 milliseconds.
First I'll deploy "nscd" service on all three journal nodes and will update
you accordingly.
Thanks,
Shaik
On 2
A problem I've seen a few times is that slow lookups of the hdfs user's
groups at the JournalNode introduce delays in handling the edit logging
RPC, which then times out at the NameNode side, ultimately causing an
abort and an HA failover. If your environment is experiencing this, then
you'll see
Hi Shaik,
The error basically indicates that namenode crashed waiting for the
write and sync to happen on the quorum of JournalNodes. In your case
atleast 2 journal nodes should complete the write and sync without the
timeout period of 20 seconds which does not seems to be the case.
I will advice
Hi All,
I am running 8 node HDP 2.3 Hadoop Cluster (3 Master+5 DataNodes) with
Kerberos security.
NameNode having HA and it is crashing at least once in a day with "flush
failed for required journal " exception. don't have any network issues
between the nodes.
I have tried to find the causing t