[ https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Thomas Graves updated YARN-1670: -------------------------------- Priority: Critical (was: Major) > aggregated log writer can write more log data then it says is the log length > ---------------------------------------------------------------------------- > > Key: YARN-1670 > URL: https://issues.apache.org/jira/browse/YARN-1670 > Project: Hadoop YARN > Issue Type: Bug > Affects Versions: 0.23.10, 2.2.0 > Reporter: Thomas Graves > Priority: Critical > > We have seen exceptions when using 'yarn logs' to read log files. > at > java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) > at java.lang.Long.parseLong(Long.java:441) > at java.lang.Long.parseLong(Long.java:483) > at > org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518) > at > org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178) > at > org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130) > at > org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246) > We traced it down to the reader trying to read the file type of the next file > but where it reads is still log data from the previous file. What happened > was the Log Length was written as a certain size but the log data was > actually longer then that. > Inside of the write() routine in LogValue it first writes what the logfile > length is, but then when it goes to write the log itself it just goes to the > end of the file. There is a race condition here where if someone is still > writing to the file when it goes to be aggregated the length written could be > to small. > We should have the write() routine stop when it writes whatever it said was > the length. It would be nice if we could somehow tell the user it might be > truncated but I'm not sure of a good way to do this. > We also noticed that a bug in readAContainerLogsForALogType where it is using > an int for curRead whereas it should be using a long. > while (len != -1 && curRead < fileLength) { > This isn't actually a problem right now as it looks like the underlying > decoder is doing the right thing and the len condition exits. -- This message was sent by Atlassian JIRA (v6.1.5#6160)