[ https://issues.apache.org/jira/browse/YARN-7168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16156350#comment-16156350 ]
Jiandan Yang commented on YARN-7168: ------------------------------------- Sorry, I should create this issue in Hadoop HDFS, can anyone help me move to Hadoop HDFS project? > The size of dataQueue and ackQueue in DataStreamer has no limit when writer > thread is interrupted > ------------------------------------------------------------------------------------------------- > > Key: YARN-7168 > URL: https://issues.apache.org/jira/browse/YARN-7168 > Project: Hadoop YARN > Issue Type: Bug > Components: client > Reporter: Jiandan Yang > Attachments: mat.jpg > > > In our cluster, when found NodeManager frequently FullGC when decommissioning > NodeManager, and we found the biggest object is dataQueue of DataStreamer, it > has almost 6w DFSPacket, and every DFSPacket is about 64k, as shown below. > The root reason is that the size of dataQueue and ackQueue in DataStreamer > has no limit when writer thread is interrupted. > DFSOutputStream#waitAndQueuePacket does not wait when writer thread is > interrupted. I know NodeManager may stop writing when interruped, but > DFSOutputStream also could do something to avoid Infinite growth of dataQueue. > {code:java} > while (!streamerClosed && dataQueue.size() + ackQueue.size() > > dfsClient.getConf().getWriteMaxPackets()) { > if (firstWait) { > Span span = Tracer.getCurrentSpan(); > if (span != null) { > span.addTimelineAnnotation("dataQueue.wait"); > } > firstWait = false; > } > try { > dataQueue.wait(); > } catch (InterruptedException e) { > // If we get interrupted while waiting to queue data, we still > need to get rid > // of the current packet. This is because we have an invariant > that if > // currentPacket gets full, it will get queued before the next > writeChunk. > // > // Rather than wait around for space in the queue, we should > instead try to > // return to the caller as soon as possible, even though we > slightly overrun > // the MAX_PACKETS length. > Thread.currentThread().interrupt(); > break; > } > } > } finally { > Span span = Tracer.getCurrentSpan(); > if ((span != null) && (!firstWait)) { > span.addTimelineAnnotation("end.wait"); > } > } > {code} > !mat.jpg|memory_analysis! -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org