Rakesh R created HDDS-1687:
------------------------------

             Summary: Datanode process shutdown due to OOME
                 Key: HDDS-1687
                 URL: https://issues.apache.org/jira/browse/HDDS-1687
             Project: Hadoop Distributed Data Store
          Issue Type: Bug
    Affects Versions: 0.5.0
            Reporter: Rakesh R
         Attachments: baseline test - datanode error logs.0.5.0.rar

Ran Freon benchmark in a three node cluster and with more parallel writer 
threads, datanode daemon hits OOME and got shutdown. Used HDD as storage type 
in worker nodes.

+Freon with the args:-+
--numOfBuckets=10 --numOfKeys=8 --keySize=67108864 --numOfVolumes=100 
--numOfThreads=100


*DN-2* : Process got killed during the test, due to OOME
{code}
2019-06-13 00:48:11,976 ERROR 
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker: Terminating 
with exit status 1: 
a0cb8914-b51c-41b1-b5d2-59313cf38c0b-SegmentedRaftLogWorker:Storage Directory 
/data/datab/ozone/metadir/ratis/cbf29739-cbd1-4b00-8a21-2db750004dc7 failed.
java.lang.OutOfMemoryError: Direct buffer memory
               at java.nio.Bits.reserveMemory(Bits.java:694)
               at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123)
               at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
               at 
org.apache.ratis.server.raftlog.segmented.BufferedWriteChannel.<init>(BufferedWriteChannel.java:44)
               at 
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogOutputStream.<init>(SegmentedRaftLogOutputStream.java:70)
               at 
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker$StartLogSegment.execute(SegmentedRaftLogWorker.java:481)
               at 
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker.run(SegmentedRaftLogWorker.java:234)
               at java.lang.Thread.run(Thread.java:748)
{code}

*DN3* : Process got killed during the test, due to OOME. I could see lots of 
NPE at the datanode logs.
{code}
2019-06-13 00:44:44,581 INFO org.apache.ratis.grpc.server.GrpcLogAppender: 
83232f1f-4469-4a4d-b369-c131c8432ae9: follower 
07ace812-3883-47d3-ac95-3d55de5fab5c:10.243.61.192:9858's next index is 0, 
log's start index is 10062, need to notify follower to install snapshot
2019-06-13 00:44:44,582 INFO org.apache.ratis.grpc.server.GrpcLogAppender: 
83232f1f-4469-4a4d-b369-c131c8432ae9->07ace812-3883-47d3-ac95-3d55de5fab5c: 
follower responses installSnapshot Completed
2019-06-13 00:44:44,582 INFO org.apache.ratis.grpc.server.GrpcLogAppender: 
83232f1f-4469-4a4d-b369-c131c8432ae9: follower 
07ace812-3883-47d3-ac95-3d55de5fab5c:10.243.61.192:9858's next index is 0, 
log's start index is 10062, need to notify follower to install snapshot
2019-06-13 00:44:44,587 ERROR org.apache.ratis.server.impl.LogAppender: 
org.apache.ratis.server.impl.LogAppender$AppenderDaemon@554415fe unexpected 
exception
java.lang.NullPointerException: 
83232f1f-4469-4a4d-b369-c131c8432ae9->07ace812-3883-47d3-ac95-3d55de5fab5c: 
Previous TermIndex not found for firstIndex = 10062
               at java.util.Objects.requireNonNull(Objects.java:290)
               at 
org.apache.ratis.server.impl.LogAppender.assertProtos(LogAppender.java:234)
               at 
org.apache.ratis.server.impl.LogAppender.createRequest(LogAppender.java:221)
               at 
org.apache.ratis.grpc.server.GrpcLogAppender.appendLog(GrpcLogAppender.java:169)
               at 
org.apache.ratis.grpc.server.GrpcLogAppender.runAppenderImpl(GrpcLogAppender.java:113)
               at 
org.apache.ratis.server.impl.LogAppender$AppenderDaemon.run(LogAppender.java:80)
               at java.lang.Thread.run(Thread.java:748)

OOME log messages present in the *.out file.

Exception in thread 
"org.apache.ratis.server.impl.LogAppender$AppenderDaemon$$Lambda$267/386355867@1d9c10b3"
 java.lang.OutOfMemoryError: unable to create new native thread
               at java.lang.Thread.start0(Native Method)
               at java.lang.Thread.start(Thread.java:717)
               at 
org.apache.ratis.server.impl.LogAppender$AppenderDaemon.start(LogAppender.java:68)
               at 
org.apache.ratis.server.impl.LogAppender.startAppender(LogAppender.java:153)
               at java.util.ArrayList.forEach(ArrayList.java:1257)
               at 
org.apache.ratis.server.impl.LeaderState.addAndStartSenders(LeaderState.java:372)
               at 
org.apache.ratis.server.impl.LeaderState.restartSender(LeaderState.java:394)
               at 
org.apache.ratis.server.impl.LogAppender$AppenderDaemon.run(LogAppender.java:97)
               at java.lang.Thread.run(Thread.java:748)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Reply via email to