[ https://issues.apache.org/jira/browse/YARN-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mit Desai updated YARN-1670: ---------------------------- Attachment: YARN-1670-v3.patch Thanks [~vinodkv] for the feedback. 1- I changed the formatting. 2- I have modified the patch to use up less memory. It should work now. I have also tested the new patch on my Eclipse IDE with HeapSize=1GB and the test pass every time I run it. > aggregated log writer can write more log data then it says is the log length > ---------------------------------------------------------------------------- > > Key: YARN-1670 > URL: https://issues.apache.org/jira/browse/YARN-1670 > Project: Hadoop YARN > Issue Type: Bug > Affects Versions: 3.0.0, 0.23.10, 2.2.0 > Reporter: Thomas Graves > Assignee: Mit Desai > Priority: Critical > Attachments: YARN-1670-b23.patch, YARN-1670-v2-b23.patch, > YARN-1670-v2.patch, YARN-1670-v3-b23.patch, YARN-1670-v3.patch, > YARN-1670.patch, YARN-1670.patch > > > We have seen exceptions when using 'yarn logs' to read log files. > at > java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) > at java.lang.Long.parseLong(Long.java:441) > at java.lang.Long.parseLong(Long.java:483) > at > org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518) > at > org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178) > at > org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130) > at > org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246) > We traced it down to the reader trying to read the file type of the next file > but where it reads is still log data from the previous file. What happened > was the Log Length was written as a certain size but the log data was > actually longer then that. > Inside of the write() routine in LogValue it first writes what the logfile > length is, but then when it goes to write the log itself it just goes to the > end of the file. There is a race condition here where if someone is still > writing to the file when it goes to be aggregated the length written could be > to small. > We should have the write() routine stop when it writes whatever it said was > the length. It would be nice if we could somehow tell the user it might be > truncated but I'm not sure of a good way to do this. > We also noticed that a bug in readAContainerLogsForALogType where it is using > an int for curRead whereas it should be using a long. > while (len != -1 && curRead < fileLength) { > This isn't actually a problem right now as it looks like the underlying > decoder is doing the right thing and the len condition exits. -- This message was sent by Atlassian JIRA (v6.2#6252)