Hi Shaik, The error basically indicates that namenode crashed waiting for the write and sync to happen on the quorum of JournalNodes. In your case atleast 2 journal nodes should complete the write and sync without the timeout period of 20 seconds which does not seems to be the case.
I will advice you to verify the journal node logs and you should find something interesting on them. Maybe some reasons for failures to complete the write and sync operation on journal nodes. Regards, Gagan Brahmi On Thu, Apr 28, 2016 at 4:32 AM, Shaik M <munna.had...@gmail.com> wrote: > Hi All, > > I am running 8 node HDP 2.3 Hadoop Cluster (3 Master+5 DataNodes) with > Kerberos security. > > NameNode having HA and it is crashing at least once in a day with "flush > failed for required journal " exception. don't have any network issues > between the nodes. > > I have tried to find the causing the issue, but, i couldn't able to found > proper resolution. Please help me to fix this issue. > > Thank you, > Shaik > > 2016-04-28 05:05:23,159 WARN client.QuorumJournalManager > (QuorumCall.java:waitFor(134)) - Waited 18015 ms (timeout=20000 ms) for a > response for sendEdits. Succeeded so far: [10.192.149.194:8485] > 2016-04-28 05:05:23,483 INFO BlockStateChange > (BlockManager.java:computeReplicationWorkForBlocks(1522)) - BLOCK* > neededReplications = 0, pendingReplications = 0. > 2016-04-28 05:05:24,160 WARN client.QuorumJournalManager > (QuorumCall.java:waitFor(134)) - Waited 19016 ms (timeout=20000 ms) for a > response for sendEdits. Succeeded so far: [10.192.149.194:8485] > 2016-04-28 05:05:25,145 FATAL namenode.FSEditLog > (JournalSet.java:mapJournalsAndReportErrors(398)) - Error: flush failed for > required journal (JournalAndStream(mgr=QJM to [10.192.149.187:8485, > 10.192.149.195:8485, 10.192.149.194:8485], stream=QuorumOutputStream > starting at txid 26198626)) > java.io.IOException: Timed out waiting 20000ms for a quorum of nodes to > respond. > at > org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:137) > at > org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.flushAndSync(QuorumOutputStream.java:107) > at > org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:113) > at > org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:107) > at > org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream$8.apply(JournalSet.java:533) > at > org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:393) > at > org.apache.hadoop.hdfs.server.namenode.JournalSet.access$100(JournalSet.java:57) > at > org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream.flush(JournalSet.java:529) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:647) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:3492) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:787) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:536) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2137) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2133) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2131) > 2016-04-28 05:05:25,147 WARN client.QuorumJournalManager > (QuorumOutputStream.java:abort(72)) - Aborting QuorumOutputStream starting > at txid 26198626 > 2016-04-28 05:05:25,150 INFO util.ExitUtil (ExitUtil.java:terminate(124)) - > Exiting with status 1 > 2016-04-28 05:05:25,160 INFO namenode.NameNode (LogAdapter.java:info(47)) - > SHUTDOWN_MSG: > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org For additional commands, e-mail: user-h...@hadoop.apache.org