[ https://issues.apache.org/jira/browse/YARN-3760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Daryn Sharp reopened YARN-3760: ------------------------------- Line numbers are from an old release but the error is evident. {code} java.lang.IllegalStateException: Cannot close TFile in the middle of key-value insertion. at org.apache.hadoop.io.file.tfile.TFile$Writer.close(TFile.java:310) at org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogWriter.close(AggregatedLogFormat.java:456) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.uploadLogsForContainers(AppLogAggregatorImpl.java:326) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.doAppLogAggregation(AppLogAggregatorImpl.java:429) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.run(AppLogAggregatorImpl.java:388) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService$2.run(LogAggregationService.java:387) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) {code} _AggregatedLogFormat.LogWriter_ {code} public void close() { try { this.writer.close(); } catch (IOException e) { LOG.warn("Exception closing writer", e); } IOUtils.closeStream(fsDataOStream); } {code} TFile writer's close which may throw {{IllegalStateException}} if the underlying fs data stream failed. Unfortunately it only catches IOE, so the ISE rips out w/o closing the fsdata stream. Additionally, the ctor creates the fs data stream then a TFile.Writer w/o a try/catch. If the TFile.Writer ctor throws an exception, it's impossible to close the stream. I haven't checked if there are futher issues with closing the writer high in the stack. > Log aggregation failures > ------------------------- > > Key: YARN-3760 > URL: https://issues.apache.org/jira/browse/YARN-3760 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager > Affects Versions: 2.4.0 > Reporter: Daryn Sharp > Priority: Critical > > The aggregated log file does not appear to be properly closed when writes > fail. This leaves a lease renewer active in the NM that spams the NN with > lease renewals. If the token is marked not to be cancelled, the renewals > appear to continue until the token expires. If the token is cancelled, the > periodic renew spam turns into a flood of failed connections until the lease > renewer gives up. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org