Rakesh R created HDDS-1687: ------------------------------ Summary: Datanode process shutdown due to OOME Key: HDDS-1687 URL: https://issues.apache.org/jira/browse/HDDS-1687 Project: Hadoop Distributed Data Store Issue Type: Bug Affects Versions: 0.5.0 Reporter: Rakesh R Attachments: baseline test - datanode error logs.0.5.0.rar
Ran Freon benchmark in a three node cluster and with more parallel writer threads, datanode daemon hits OOME and got shutdown. Used HDD as storage type in worker nodes. +Freon with the args:-+ --numOfBuckets=10 --numOfKeys=8 --keySize=67108864 --numOfVolumes=100 --numOfThreads=100 *DN-2* : Process got killed during the test, due to OOME {code} 2019-06-13 00:48:11,976 ERROR org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker: Terminating with exit status 1: a0cb8914-b51c-41b1-b5d2-59313cf38c0b-SegmentedRaftLogWorker:Storage Directory /data/datab/ozone/metadir/ratis/cbf29739-cbd1-4b00-8a21-2db750004dc7 failed. java.lang.OutOfMemoryError: Direct buffer memory at java.nio.Bits.reserveMemory(Bits.java:694) at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123) at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) at org.apache.ratis.server.raftlog.segmented.BufferedWriteChannel.<init>(BufferedWriteChannel.java:44) at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogOutputStream.<init>(SegmentedRaftLogOutputStream.java:70) at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker$StartLogSegment.execute(SegmentedRaftLogWorker.java:481) at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker.run(SegmentedRaftLogWorker.java:234) at java.lang.Thread.run(Thread.java:748) {code} *DN3* : Process got killed during the test, due to OOME. I could see lots of NPE at the datanode logs. {code} 2019-06-13 00:44:44,581 INFO org.apache.ratis.grpc.server.GrpcLogAppender: 83232f1f-4469-4a4d-b369-c131c8432ae9: follower 07ace812-3883-47d3-ac95-3d55de5fab5c:10.243.61.192:9858's next index is 0, log's start index is 10062, need to notify follower to install snapshot 2019-06-13 00:44:44,582 INFO org.apache.ratis.grpc.server.GrpcLogAppender: 83232f1f-4469-4a4d-b369-c131c8432ae9->07ace812-3883-47d3-ac95-3d55de5fab5c: follower responses installSnapshot Completed 2019-06-13 00:44:44,582 INFO org.apache.ratis.grpc.server.GrpcLogAppender: 83232f1f-4469-4a4d-b369-c131c8432ae9: follower 07ace812-3883-47d3-ac95-3d55de5fab5c:10.243.61.192:9858's next index is 0, log's start index is 10062, need to notify follower to install snapshot 2019-06-13 00:44:44,587 ERROR org.apache.ratis.server.impl.LogAppender: org.apache.ratis.server.impl.LogAppender$AppenderDaemon@554415fe unexpected exception java.lang.NullPointerException: 83232f1f-4469-4a4d-b369-c131c8432ae9->07ace812-3883-47d3-ac95-3d55de5fab5c: Previous TermIndex not found for firstIndex = 10062 at java.util.Objects.requireNonNull(Objects.java:290) at org.apache.ratis.server.impl.LogAppender.assertProtos(LogAppender.java:234) at org.apache.ratis.server.impl.LogAppender.createRequest(LogAppender.java:221) at org.apache.ratis.grpc.server.GrpcLogAppender.appendLog(GrpcLogAppender.java:169) at org.apache.ratis.grpc.server.GrpcLogAppender.runAppenderImpl(GrpcLogAppender.java:113) at org.apache.ratis.server.impl.LogAppender$AppenderDaemon.run(LogAppender.java:80) at java.lang.Thread.run(Thread.java:748) OOME log messages present in the *.out file. Exception in thread "org.apache.ratis.server.impl.LogAppender$AppenderDaemon$$Lambda$267/386355867@1d9c10b3" java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Thread.java:717) at org.apache.ratis.server.impl.LogAppender$AppenderDaemon.start(LogAppender.java:68) at org.apache.ratis.server.impl.LogAppender.startAppender(LogAppender.java:153) at java.util.ArrayList.forEach(ArrayList.java:1257) at org.apache.ratis.server.impl.LeaderState.addAndStartSenders(LeaderState.java:372) at org.apache.ratis.server.impl.LeaderState.restartSender(LeaderState.java:394) at org.apache.ratis.server.impl.LogAppender$AppenderDaemon.run(LogAppender.java:97) at java.lang.Thread.run(Thread.java:748) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org