[ 
https://issues.apache.org/jira/browse/YARN-3760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daryn Sharp reopened YARN-3760:
-------------------------------

Line numbers are from an old release but the error is evident.
{code}
java.lang.IllegalStateException: Cannot close TFile in the middle of key-value 
insertion.
        at org.apache.hadoop.io.file.tfile.TFile$Writer.close(TFile.java:310)
        at 
org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogWriter.close(AggregatedLogFormat.java:456)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.uploadLogsForContainers(AppLogAggregatorImpl.java:326)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.doAppLogAggregation(AppLogAggregatorImpl.java:429)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.run(AppLogAggregatorImpl.java:388)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService$2.run(LogAggregationService.java:387)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:722)
{code}

_AggregatedLogFormat.LogWriter_
{code}
    public void close() {
      try {
        this.writer.close();
      } catch (IOException e) {
        LOG.warn("Exception closing writer", e);
      }
      IOUtils.closeStream(fsDataOStream);
    }
{code}
TFile writer's close which may throw {{IllegalStateException}} if the 
underlying fs data stream failed.  Unfortunately it only catches IOE, so the 
ISE rips out w/o closing the fsdata stream.

Additionally, the ctor creates the fs data stream then a TFile.Writer w/o a 
try/catch.  If the TFile.Writer ctor throws an exception, it's impossible to 
close the stream.

I haven't checked if there are futher issues with closing the writer high in 
the stack.

> Log aggregation failures 
> -------------------------
>
>                 Key: YARN-3760
>                 URL: https://issues.apache.org/jira/browse/YARN-3760
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 2.4.0
>            Reporter: Daryn Sharp
>            Priority: Critical
>
> The aggregated log file does not appear to be properly closed when writes 
> fail.  This leaves a lease renewer active in the NM that spams the NN with 
> lease renewals.  If the token is marked not to be cancelled, the renewals 
> appear to continue until the token expires.  If the token is cancelled, the 
> periodic renew spam turns into a flood of failed connections until the lease 
> renewer gives up.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to