[ https://issues.apache.org/jira/browse/YARN-5529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15765570#comment-15765570 ]
Robert Kanter edited comment on YARN-5529 at 1/3/17 8:01 PM: ------------------------------------------------------------- A few things: # Please look into the failed tests # In {{ReadWriteDiskValidator}}, we create a file using {{Files.createTempFile}} and delete it near the end. If some kind of exception happens (either because we're throwing a {{DiskErrorException}} or a more generic {{IOException}} from some problem), we won't end up deleting the file. If this happens a lot, we can start filling up the disk with junk over time. We should put the delete call in a {{finally}}. # Typo: {{throw new DiskErrorException("Data in file has bee corrupted.");}} It should be {{been}}. # In {{TestReadWriteDiskValidator#testReadWriteDiskValidator}}, the {{100}} used in a few places could be made a constant, or at least a variable # In {{TestReadWriteDiskValidator#testCheckFailures}}, the {{try-catch}} blocks where we're expecting a failure should {{fail}} if a {{DiskErrorException}} is not thrown, and should do some basic checking in the {{catch}} statement. e.g. {code:java} try { readWriteDiskValidator.checkStatus(testDir); fail("some message"); } catch (DiskErrorException e) { some_basic_check(e); } {code} # In {{TestReadWriteDiskValidator#testCheckFailures}} when we set the permissions to {{000}}, might that cause a problem when trying to clean it up (i.e. {{mvn clean}})? was (Author: rkanter): A few things: # Please look into the failed tests # In {{ReadWriteDiskValidator}}, we create a file using {{Files.createTempFile}} and delete it near the end. If some kind of exception happens (either because we're throwing a {{DiskErrorException}} or a more generic {{IOException from some problem), we won't end up deleting the file. If this happens a lot, we can start filling up the disk with junk over time. We should put the delete call in a {{finally}}. # Typo: {{throw new DiskErrorException("Data in file has bee corrupted.");}} It should be {{been}}. # In {{TestReadWriteDiskValidator#testReadWriteDiskValidator}}, the {{100}} used in a few places could be made a constant, or at least a variable # In {{TestReadWriteDiskValidator#testCheckFailures}}, the {{try-catch}} blocks where we're expecting a failure should {{fail}} if a {{DiskErrorException}} is not thrown, and should do some basic checking in the {{catch}} statement. e.g. {code:java} try { readWriteDiskValidator.checkStatus(testDir); fail("some message"); } catch (DiskErrorException e) { some_basic_check(e); } {code} # In {{TestReadWriteDiskValidator#testCheckFailures}} when we set the permissions to {{000}}, might that cause a problem when trying to clean it up (i.e. {{mvn clean}})? > Create new DiskValidator class with metrics > ------------------------------------------- > > Key: YARN-5529 > URL: https://issues.apache.org/jira/browse/YARN-5529 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager > Reporter: Ray Chiang > Assignee: Yufei Gu > Labels: supportability > Attachments: YARN-5529.001.patch, YARN-5529.002.patch, > YARN-5529.003.patch > > > With really large clusters, the basic DiskValidator isn't sufficient for some > of the less common types of disk failures. > Look at a new DiskValidator that could do one or more of the following: > - Add new tests to find more problems > - Add new metrics to at least characterize problems that we haven't predicted -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org