[ https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13946404#comment-13946404 ]
Hudson commented on YARN-1670: ------------------------------ FAILURE: Integrated in Hadoop-Yarn-trunk #520 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/520/]) YARN-1670. aggregated log writer can write more log data then it says is the log length (Mit Desai via jeagles) (jeagles: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1580957) * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogFormat.java > aggregated log writer can write more log data then it says is the log length > ---------------------------------------------------------------------------- > > Key: YARN-1670 > URL: https://issues.apache.org/jira/browse/YARN-1670 > Project: Hadoop YARN > Issue Type: Bug > Affects Versions: 3.0.0, 0.23.10, 2.2.0 > Reporter: Thomas Graves > Assignee: Mit Desai > Priority: Critical > Fix For: 3.0.0, 0.23.11, 2.4.0, 2.5.0 > > Attachments: YARN-1670-b23.patch, YARN-1670-v2-b23.patch, > YARN-1670-v2.patch, YARN-1670-v3-b23.patch, YARN-1670-v3.patch, > YARN-1670-v4-b23.patch, YARN-1670-v4-b23.patch, YARN-1670-v4.patch, > YARN-1670-v4.patch, YARN-1670.patch, YARN-1670.patch > > > We have seen exceptions when using 'yarn logs' to read log files. > at > java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) > at java.lang.Long.parseLong(Long.java:441) > at java.lang.Long.parseLong(Long.java:483) > at > org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518) > at > org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178) > at > org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130) > at > org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246) > We traced it down to the reader trying to read the file type of the next file > but where it reads is still log data from the previous file. What happened > was the Log Length was written as a certain size but the log data was > actually longer then that. > Inside of the write() routine in LogValue it first writes what the logfile > length is, but then when it goes to write the log itself it just goes to the > end of the file. There is a race condition here where if someone is still > writing to the file when it goes to be aggregated the length written could be > to small. > We should have the write() routine stop when it writes whatever it said was > the length. It would be nice if we could somehow tell the user it might be > truncated but I'm not sure of a good way to do this. > We also noticed that a bug in readAContainerLogsForALogType where it is using > an int for curRead whereas it should be using a long. > while (len != -1 && curRead < fileLength) { > This isn't actually a problem right now as it looks like the underlying > decoder is doing the right thing and the len condition exits. -- This message was sent by Atlassian JIRA (v6.2#6252)