[jira] [Commented] (YARN-3001) RM dies because of divide by zero
[ https://issues.apache.org/jira/browse/YARN-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14595423#comment-14595423 ] Hui Zheng commented on YARN-3001: - The only non-INFO log is following(it is so sudden there is not any other WARN or ERROR ). There are several tens thousands of jobs per day. {code} 2015-06-21 09:53:44,696 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type NODE_UPDATE to the scheduler java.lang.ArithmeticException: / by zero at org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator.computeAvailableContainers(DefaultResourceCalculator.java:37) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1335) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignNodeLocalContainers(LeafQueue.java:1185) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1136) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:871) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:645) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:559) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:690) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:734) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:86) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run( ResourceManager.java:557)at java.lang.Thread.run(Thread.java:724) 2015-06-21 09:53:44,696 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. {code} > RM dies because of divide by zero > - > > Key: YARN-3001 > URL: https://issues.apache.org/jira/browse/YARN-3001 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.5.1 >Reporter: hoelog >Assignee: Rohith Sharma K S > > RM dies because of divide by zero exception. > {code} > 2014-12-31 21:27:05,022 FATAL > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in > handling event type NODE_UPDATE to the scheduler > java.lang.ArithmeticException: / by zero > at > org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator.computeAvailableContainers(DefaultResourceCalculator.java:37) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1332) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1218) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1177) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:877) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:656) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:570) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:851) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:900) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:98) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:599) > at java.lang.Thread.run(Thread.java:745) > 2014-12-31 21:27:05,023 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3826) Race condition in ResourceTrackerService: potential wrong diagnostics messages
[ https://issues.apache.org/jira/browse/YARN-3826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14595391#comment-14595391 ] Devaraj K commented on YARN-3826: - Thanks [~chengbing.liu] for details. > Race condition in ResourceTrackerService: potential wrong diagnostics messages > -- > > Key: YARN-3826 > URL: https://issues.apache.org/jira/browse/YARN-3826 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.0 >Reporter: Chengbing Liu >Assignee: Chengbing Liu > Attachments: YARN-3826.01.patch > > > Since we are calling {{setDiagnosticsMessage}} in {{nodeHeartbeat}}, which > can be called concurrently, the static {{resync}} and {{shutdown}} may have > wrong diagnostics messages in some cases. > On the other side, these static members can hardly save any memory, since the > normal heartbeat responses are created for each heartbeat. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3790) TestWorkPreservingRMRestart#testSchedulerRecovery fails intermittently in trunk for FS scheduler
[ https://issues.apache.org/jira/browse/YARN-3790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14595372#comment-14595372 ] Rohith Sharma K S commented on YARN-3790: - [~jianhe] Do you have any comments on the patch? > TestWorkPreservingRMRestart#testSchedulerRecovery fails intermittently in > trunk for FS scheduler > > > Key: YARN-3790 > URL: https://issues.apache.org/jira/browse/YARN-3790 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler, test >Reporter: Rohith Sharma K S >Assignee: zhihai xu > Attachments: YARN-3790.000.patch > > > Failure trace is as follows > {noformat} > Tests run: 28, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 284.078 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart > testSchedulerRecovery[1](org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart) > Time elapsed: 6.502 sec <<< FAILURE! > java.lang.AssertionError: expected:<6144> but was:<8192> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.assertMetrics(TestWorkPreservingRMRestart.java:853) > at > org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.checkFSQueue(TestWorkPreservingRMRestart.java:342) > at > org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.testSchedulerRecovery(TestWorkPreservingRMRestart.java:241) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1427) yarn-env.cmd should have the analog comments that are in yarn-env.sh
[ https://issues.apache.org/jira/browse/YARN-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14595342#comment-14595342 ] Hadoop QA commented on YARN-1427: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 0m 0s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | release audit | 0m 15s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | | | 0m 18s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12740952/YARN-1427-trunk.4.patch | | Optional Tests | | | git revision | trunk / 6c7a9d5 | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8305/console | This message was automatically generated. > yarn-env.cmd should have the analog comments that are in yarn-env.sh > > > Key: YARN-1427 > URL: https://issues.apache.org/jira/browse/YARN-1427 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.4.0 >Reporter: Zhijie Shen >Assignee: Zhiyuan Yang > Labels: BB2015-05-TBR, newbie, windows > Attachments: YARN-1427-trunk.2.patch, YARN-1427-trunk.3.patch, > YARN-1427-trunk.4.patch, YARN-1427.1.patch > > > There're the paragraphs of about RM/NM env vars (probably AHS as well soon) > in yarn-env.sh. Should the windows version script provide the similar > comments? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3834) Scrub debug logging of tokens during resource localization.
[ https://issues.apache.org/jira/browse/YARN-3834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14595338#comment-14595338 ] Chris Nauroth commented on YARN-3834: - Xuan, thank you for the code review and commit. > Scrub debug logging of tokens during resource localization. > --- > > Key: YARN-3834 > URL: https://issues.apache.org/jira/browse/YARN-3834 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 2.7.1 >Reporter: Chris Nauroth >Assignee: Chris Nauroth > Fix For: 2.8.0 > > Attachments: YARN-3834.001.patch > > > During resource localization, the NodeManager logs tokens at debug level to > aid troubleshooting. This includes the full token representation. Best > practice is to avoid logging anything secret, even at debug level. We can > improve on this by changing the logging to use a scrubbed representation of > the token. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1427) yarn-env.cmd should have the analog comments that are in yarn-env.sh
[ https://issues.apache.org/jira/browse/YARN-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhiyuan Yang updated YARN-1427: --- Attachment: YARN-1427-trunk.4.patch re-submit original patch to get debug info > yarn-env.cmd should have the analog comments that are in yarn-env.sh > > > Key: YARN-1427 > URL: https://issues.apache.org/jira/browse/YARN-1427 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.4.0 >Reporter: Zhijie Shen >Assignee: Zhiyuan Yang > Labels: BB2015-05-TBR, newbie, windows > Attachments: YARN-1427-trunk.2.patch, YARN-1427-trunk.3.patch, > YARN-1427-trunk.4.patch, YARN-1427.1.patch > > > There're the paragraphs of about RM/NM env vars (probably AHS as well soon) > in yarn-env.sh. Should the windows version script provide the similar > comments? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3001) RM dies because of divide by zero
[ https://issues.apache.org/jira/browse/YARN-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14595325#comment-14595325 ] Rohith Sharma K S commented on YARN-3001: - Hi [~huizane], thanks for reply. Would you please attach the RM logs if you have? > RM dies because of divide by zero > - > > Key: YARN-3001 > URL: https://issues.apache.org/jira/browse/YARN-3001 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.5.1 >Reporter: hoelog >Assignee: Rohith Sharma K S > > RM dies because of divide by zero exception. > {code} > 2014-12-31 21:27:05,022 FATAL > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in > handling event type NODE_UPDATE to the scheduler > java.lang.ArithmeticException: / by zero > at > org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator.computeAvailableContainers(DefaultResourceCalculator.java:37) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1332) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1218) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1177) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:877) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:656) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:570) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:851) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:900) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:98) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:599) > at java.lang.Thread.run(Thread.java:745) > 2014-12-31 21:27:05,023 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3001) RM dies because of divide by zero
[ https://issues.apache.org/jira/browse/YARN-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14595293#comment-14595293 ] Hui Zheng commented on YARN-3001: - This problem also happened twice(31/Jan/15 and 20/Jun/15) in our cluster. We use hadoop-2.2.0 and also set "yarn.scheduler.minimum-allocation-mb=3072". > RM dies because of divide by zero > - > > Key: YARN-3001 > URL: https://issues.apache.org/jira/browse/YARN-3001 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.5.1 >Reporter: hoelog >Assignee: Rohith Sharma K S > > RM dies because of divide by zero exception. > {code} > 2014-12-31 21:27:05,022 FATAL > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in > handling event type NODE_UPDATE to the scheduler > java.lang.ArithmeticException: / by zero > at > org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator.computeAvailableContainers(DefaultResourceCalculator.java:37) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1332) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1218) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1177) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:877) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:656) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:570) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:851) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:900) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:98) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:599) > at java.lang.Thread.run(Thread.java:745) > 2014-12-31 21:27:05,023 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1427) yarn-env.cmd should have the analog comments that are in yarn-env.sh
[ https://issues.apache.org/jira/browse/YARN-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14595263#comment-14595263 ] Hadoop QA commented on YARN-1427: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12740910/YARN-1427-trunk.3.patch | | Optional Tests | | | git revision | trunk / 6c7a9d5 | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8304/console | This message was automatically generated. > yarn-env.cmd should have the analog comments that are in yarn-env.sh > > > Key: YARN-1427 > URL: https://issues.apache.org/jira/browse/YARN-1427 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.4.0 >Reporter: Zhijie Shen >Assignee: Zhiyuan Yang > Labels: BB2015-05-TBR, newbie, windows > Attachments: YARN-1427-trunk.2.patch, YARN-1427-trunk.3.patch, > YARN-1427.1.patch > > > There're the paragraphs of about RM/NM env vars (probably AHS as well soon) > in yarn-env.sh. Should the windows version script provide the similar > comments? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3834) Scrub debug logging of tokens during resource localization.
[ https://issues.apache.org/jira/browse/YARN-3834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14595257#comment-14595257 ] Hudson commented on YARN-3834: -- FAILURE: Integrated in Hadoop-trunk-Commit #8043 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8043/]) YARN-3834. Scrub debug logging of tokens during resource localization. Contributed by Chris Nauroth (xgong: rev 6c7a9d502a633b5aca75c9798f19ce4a5729014e) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java * hadoop-yarn-project/CHANGES.txt > Scrub debug logging of tokens during resource localization. > --- > > Key: YARN-3834 > URL: https://issues.apache.org/jira/browse/YARN-3834 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 2.7.1 >Reporter: Chris Nauroth >Assignee: Chris Nauroth > Fix For: 2.8.0 > > Attachments: YARN-3834.001.patch > > > During resource localization, the NodeManager logs tokens at debug level to > aid troubleshooting. This includes the full token representation. Best > practice is to avoid logging anything secret, even at debug level. We can > improve on this by changing the logging to use a scrubbed representation of > the token. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3834) Scrub debug logging of tokens during resource localization.
[ https://issues.apache.org/jira/browse/YARN-3834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14595253#comment-14595253 ] Xuan Gong commented on YARN-3834: - Committed into trunk/branch-2. Thanks, Chris > Scrub debug logging of tokens during resource localization. > --- > > Key: YARN-3834 > URL: https://issues.apache.org/jira/browse/YARN-3834 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 2.7.1 >Reporter: Chris Nauroth >Assignee: Chris Nauroth > Fix For: 2.8.0 > > Attachments: YARN-3834.001.patch > > > During resource localization, the NodeManager logs tokens at debug level to > aid troubleshooting. This includes the full token representation. Best > practice is to avoid logging anything secret, even at debug level. We can > improve on this by changing the logging to use a scrubbed representation of > the token. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3834) Scrub debug logging of tokens during resource localization.
[ https://issues.apache.org/jira/browse/YARN-3834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14595252#comment-14595252 ] Xuan Gong commented on YARN-3834: - +1 LGTM. Will commit > Scrub debug logging of tokens during resource localization. > --- > > Key: YARN-3834 > URL: https://issues.apache.org/jira/browse/YARN-3834 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 2.7.1 >Reporter: Chris Nauroth >Assignee: Chris Nauroth > Attachments: YARN-3834.001.patch > > > During resource localization, the NodeManager logs tokens at debug level to > aid troubleshooting. This includes the full token representation. Best > practice is to avoid logging anything secret, even at debug level. We can > improve on this by changing the logging to use a scrubbed representation of > the token. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3591) Resource Localisation on a bad disk causes subsequent containers failure
[ https://issues.apache.org/jira/browse/YARN-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14595250#comment-14595250 ] zhihai xu commented on YARN-3591: - Hi [~jlowe], thanks for the thorough analysis. My assumption is that the files on a bad disk are most likely inaccessible, it looks like my assumption is wrong. It looks like your first approach is better with fewer side effects. Item 5 may be very time-consuming. I can think of the following possible improvements for your first approach: # Cache all the local directories which are used by running containers for LocalizedResource with non-zero refcount. This may speed up item 5. We only need keep all the cached directories on a disk which is just repaired. # Maybe we can remove the LocalizedResource entry with zero refcount for a bad disk from the map in {{onDirsChanged}}. We should also remove it when handling {{RELEASE}} ResourceEvent. # It looks like we still need store the bad local dirs in the state store, so we can track disks, which are repaired, during NM recovery. > Resource Localisation on a bad disk causes subsequent containers failure > - > > Key: YARN-3591 > URL: https://issues.apache.org/jira/browse/YARN-3591 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Lavkesh Lahngir >Assignee: Lavkesh Lahngir > Attachments: 0001-YARN-3591.1.patch, 0001-YARN-3591.patch, > YARN-3591.2.patch, YARN-3591.3.patch, YARN-3591.4.patch, YARN-3591.5.patch > > > It happens when a resource is localised on the disk, after localising that > disk has gone bad. NM keeps paths for localised resources in memory. At the > time of resource request isResourcePresent(rsrc) will be called which calls > file.exists() on the localised path. > In some cases when disk has gone bad, inodes are stilled cached and > file.exists() returns true. But at the time of reading, file will not open. > Note: file.exists() actually calls stat64 natively which returns true because > it was able to find inode information from the OS. > A proposal is to call file.list() on the parent path of the resource, which > will call open() natively. If the disk is good it should return an array of > paths with length at-least 1. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3837) javadocs of TimelineAuthenticationFilterInitializer give wrong prefix for auth options
[ https://issues.apache.org/jira/browse/YARN-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14595147#comment-14595147 ] Hadoop QA commented on YARN-3837: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 15m 34s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 33s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 33s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 29s | There were no new checkstyle issues. | | {color:red}-1{color} | whitespace | 0m 0s | The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 34s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 0m 55s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 3m 10s | Tests passed in hadoop-yarn-server-applicationhistoryservice. | | | | 39m 49s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12740923/0002-YARN-3837.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / c7d022b | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/8303/artifact/patchprocess/whitespace.txt | | hadoop-yarn-server-applicationhistoryservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/8303/artifact/patchprocess/testrun_hadoop-yarn-server-applicationhistoryservice.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8303/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8303/console | This message was automatically generated. > javadocs of TimelineAuthenticationFilterInitializer give wrong prefix for > auth options > -- > > Key: YARN-3837 > URL: https://issues.apache.org/jira/browse/YARN-3837 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Affects Versions: 2.8.0 >Reporter: Steve Loughran >Assignee: Bibin A Chundatt >Priority: Minor > Attachments: 0001-YARN-3837.patch, 0002-YARN-3837.patch > > Original Estimate: 0.5h > Remaining Estimate: 0.5h > > The javadocs for {{TimelineAuthenticationFilterInitializer}} talk about the > prefix {{yarn.timeline-service.authentication.}}, but the code uses {{ > "yarn.timeline-service.http-authentication."}} as the prefix. > best to use {{@value}} and let the javadocs sort it out for themselves. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3837) javadocs of TimelineAuthenticationFilterInitializer give wrong prefix for auth options
[ https://issues.apache.org/jira/browse/YARN-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-3837: --- Attachment: 0002-YARN-3837.patch No testcases required since its javadoc update > javadocs of TimelineAuthenticationFilterInitializer give wrong prefix for > auth options > -- > > Key: YARN-3837 > URL: https://issues.apache.org/jira/browse/YARN-3837 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Affects Versions: 2.8.0 >Reporter: Steve Loughran >Assignee: Bibin A Chundatt >Priority: Minor > Attachments: 0001-YARN-3837.patch, 0002-YARN-3837.patch > > Original Estimate: 0.5h > Remaining Estimate: 0.5h > > The javadocs for {{TimelineAuthenticationFilterInitializer}} talk about the > prefix {{yarn.timeline-service.authentication.}}, but the code uses {{ > "yarn.timeline-service.http-authentication."}} as the prefix. > best to use {{@value}} and let the javadocs sort it out for themselves. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3838) Rest API failing when ip configured in RM address in secure https mode
[ https://issues.apache.org/jira/browse/YARN-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14595122#comment-14595122 ] Bibin A Chundatt commented on YARN-3838: Added patch for review 0001-YARN-3838. Please do review > Rest API failing when ip configured in RM address in secure https mode > -- > > Key: YARN-3838 > URL: https://issues.apache.org/jira/browse/YARN-3838 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Critical > Attachments: 0001-HADOOP-12096.patch, 0001-YARN-3810.patch, > 0001-YARN-3838.patch, 0002-YARN-3810.patch > > > Steps to reproduce > === > 1.Configure hadoop.http.authentication.kerberos.principal as below > {code:xml} > > hadoop.http.authentication.kerberos.principal > HTTP/_h...@hadoop.com > > {code} > 2. In RM web address also configure IP > 3. Startup RM > Call Rest API for RM {{ curl -i -k --insecure --negotiate -u : https IP > /ws/v1/cluster/info"}} > *Actual* > Rest API failing > {code} > 2015-06-16 19:03:49,845 DEBUG > org.apache.hadoop.security.authentication.server.AuthenticationFilter: > Authentication exception: GSSException: No valid credentials provided > (Mechanism level: Failed to find any Kerberos credentails) > org.apache.hadoop.security.authentication.client.AuthenticationException: > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos credentails) > at > org.apache.hadoop.security.authentication.server.KerberosAuthenticationHandler.authenticate(KerberosAuthenticationHandler.java:399) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationHandler.authenticate(DelegationTokenAuthenticationHandler.java:348) > at > org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:519) > at > org.apache.hadoop.yarn.server.security.http.RMAuthenticationFilter.doFilter(RMAuthenticationFilter.java:82) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3838) Rest API failing when ip configured in RM address in secure https mode
[ https://issues.apache.org/jira/browse/YARN-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-3838: --- Attachment: 0001-YARN-3838.patch > Rest API failing when ip configured in RM address in secure https mode > -- > > Key: YARN-3838 > URL: https://issues.apache.org/jira/browse/YARN-3838 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Critical > Attachments: 0001-HADOOP-12096.patch, 0001-YARN-3810.patch, > 0001-YARN-3838.patch, 0002-YARN-3810.patch > > > Steps to reproduce > === > 1.Configure hadoop.http.authentication.kerberos.principal as below > {code:xml} > > hadoop.http.authentication.kerberos.principal > HTTP/_h...@hadoop.com > > {code} > 2. In RM web address also configure IP > 3. Startup RM > Call Rest API for RM {{ curl -i -k --insecure --negotiate -u : https IP > /ws/v1/cluster/info"}} > *Actual* > Rest API failing > {code} > 2015-06-16 19:03:49,845 DEBUG > org.apache.hadoop.security.authentication.server.AuthenticationFilter: > Authentication exception: GSSException: No valid credentials provided > (Mechanism level: Failed to find any Kerberos credentails) > org.apache.hadoop.security.authentication.client.AuthenticationException: > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos credentails) > at > org.apache.hadoop.security.authentication.server.KerberosAuthenticationHandler.authenticate(KerberosAuthenticationHandler.java:399) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationHandler.authenticate(DelegationTokenAuthenticationHandler.java:348) > at > org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:519) > at > org.apache.hadoop.yarn.server.security.http.RMAuthenticationFilter.doFilter(RMAuthenticationFilter.java:82) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3839) Quit throwing NMNotYetReadyException
Karthik Kambatla created YARN-3839: -- Summary: Quit throwing NMNotYetReadyException Key: YARN-3839 URL: https://issues.apache.org/jira/browse/YARN-3839 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Karthik Kambatla Assignee: Karthik Kambatla Quit throwing NMNotYetReadyException when NM has not yet registered with the RM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14595046#comment-14595046 ] Naganarasimha G R commented on YARN-3045: - Hi [~djp],[~sjlee0] & [~zjshen], Please find the attached patch with rebasing on top of 3792. I would like to discuss regarding 2 points # I prefer to have all the container related events and entities to be published by NMTimelinePublisher, so wanted push container usage metrics also to NMTimelinePublisher. This will ensure all NM timeline stuff are put in one place and remove thread pool handling in {{ContainerMonitorImpl}} (Though later point will not be a issue when YARN-3367 is handled but due to the former reason i would prefer to move) # While testing in TestDistributedShell found out that few of the container metrics events were failing as there will be race condition. When the AM container finishes and removes the collector for the app, still there is possibility that all the events published for the app by the current NM and other NM are still in pipeline, so was wondering whether we can have timer task which periodically cleans up collector after some period and not imm remove it when AM container is finished. > [Event producers] Implement NM writing container lifecycle events to ATS > > > Key: YARN-3045 > URL: https://issues.apache.org/jira/browse/YARN-3045 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Naganarasimha G R > Labels: BB2015-05-TBR > Attachments: YARN-3045-YARN-2928.002.patch, > YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, > YARN-3045.20150420-1.patch > > > Per design in YARN-2928, implement NM writing container lifecycle events and > container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-3045: Attachment: YARN-3045-YARN-2928.004.patch > [Event producers] Implement NM writing container lifecycle events to ATS > > > Key: YARN-3045 > URL: https://issues.apache.org/jira/browse/YARN-3045 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Naganarasimha G R > Labels: BB2015-05-TBR > Attachments: YARN-3045-YARN-2928.002.patch, > YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, > YARN-3045.20150420-1.patch > > > Per design in YARN-2928, implement NM writing container lifecycle events and > container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3837) javadocs of TimelineAuthenticationFilterInitializer give wrong prefix for auth options
[ https://issues.apache.org/jira/browse/YARN-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14595006#comment-14595006 ] Hadoop QA commented on YARN-3837: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 16m 0s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 39s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 46s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 21s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 29s | There were no new checkstyle issues. | | {color:red}-1{color} | whitespace | 0m 0s | The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 34s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 0m 59s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 3m 4s | Tests passed in hadoop-yarn-server-applicationhistoryservice. | | | | 40m 27s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12740877/0001-YARN-3837.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / c7d022b | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/8302/artifact/patchprocess/whitespace.txt | | hadoop-yarn-server-applicationhistoryservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/8302/artifact/patchprocess/testrun_hadoop-yarn-server-applicationhistoryservice.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8302/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8302/console | This message was automatically generated. > javadocs of TimelineAuthenticationFilterInitializer give wrong prefix for > auth options > -- > > Key: YARN-3837 > URL: https://issues.apache.org/jira/browse/YARN-3837 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Affects Versions: 2.8.0 >Reporter: Steve Loughran >Assignee: Bibin A Chundatt >Priority: Minor > Attachments: 0001-YARN-3837.patch > > Original Estimate: 0.5h > Remaining Estimate: 0.5h > > The javadocs for {{TimelineAuthenticationFilterInitializer}} talk about the > prefix {{yarn.timeline-service.authentication.}}, but the code uses {{ > "yarn.timeline-service.http-authentication."}} as the prefix. > best to use {{@value}} and let the javadocs sort it out for themselves. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1427) yarn-env.cmd should have the analog comments that are in yarn-env.sh
[ https://issues.apache.org/jira/browse/YARN-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhiyuan Yang updated YARN-1427: --- Attachment: YARN-1427-trunk.3.patch > yarn-env.cmd should have the analog comments that are in yarn-env.sh > > > Key: YARN-1427 > URL: https://issues.apache.org/jira/browse/YARN-1427 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.4.0 >Reporter: Zhijie Shen >Assignee: Zhiyuan Yang > Labels: BB2015-05-TBR, newbie, windows > Attachments: YARN-1427-trunk.2.patch, YARN-1427-trunk.3.patch, > YARN-1427.1.patch > > > There're the paragraphs of about RM/NM env vars (probably AHS as well soon) > in yarn-env.sh. Should the windows version script provide the similar > comments? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1427) yarn-env.cmd should have the analog comments that are in yarn-env.sh
[ https://issues.apache.org/jira/browse/YARN-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14594987#comment-14594987 ] Zhiyuan Yang commented on YARN-1427: Attached patch.Thanks > yarn-env.cmd should have the analog comments that are in yarn-env.sh > > > Key: YARN-1427 > URL: https://issues.apache.org/jira/browse/YARN-1427 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.4.0 >Reporter: Zhijie Shen >Assignee: Zhiyuan Yang > Labels: BB2015-05-TBR, newbie, windows > Attachments: YARN-1427-trunk.2.patch, YARN-1427-trunk.3.patch, > YARN-1427.1.patch > > > There're the paragraphs of about RM/NM env vars (probably AHS as well soon) > in yarn-env.sh. Should the windows version script provide the similar > comments? -- This message was sent by Atlassian JIRA (v6.3.4#6332)