[jira] [Updated] (YARN-2962) ZKRMStateStore: Limit the number of znodes under a znode
[ https://issues.apache.org/jira/browse/YARN-2962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-2962: --- Hadoop Flags: Incompatible change > ZKRMStateStore: Limit the number of znodes under a znode > > > Key: YARN-2962 > URL: https://issues.apache.org/jira/browse/YARN-2962 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Karthik Kambatla >Assignee: Varun Saxena >Priority: Critical > Attachments: YARN-2962.01.patch, YARN-2962.2.patch, YARN-2962.3.patch > > > We ran into this issue where we were hitting the default ZK server message > size configs, primarily because the message had too many znodes even though > they individually they were all small. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-4488) CapacityScheduler: Compute per-container allocation latency and roll up to get per-application and per-queue
[ https://issues.apache.org/jira/browse/YARN-4488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan reassigned YARN-4488: Assignee: Wangda Tan > CapacityScheduler: Compute per-container allocation latency and roll up to > get per-application and per-queue > > > Key: YARN-4488 > URL: https://issues.apache.org/jira/browse/YARN-4488 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Karthik Kambatla >Assignee: Wangda Tan > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-4485) Capture per-application and per-queue container allocation latency
[ https://issues.apache.org/jira/browse/YARN-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla reassigned YARN-4485: -- Assignee: (was: Karthik Kambatla) Leaving the umbrella JIRA unassigned. > Capture per-application and per-queue container allocation latency > -- > > Key: YARN-4485 > URL: https://issues.apache.org/jira/browse/YARN-4485 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.7.1 >Reporter: Karthik Kambatla > Labels: supportability, tuning > > Per-application and per-queue container allocation latencies would go a long > way towards help with tuning scheduler queue configs. > This umbrella JIRA tracks adding these metrics. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4485) [Umbrella] Capture per-application and per-queue container allocation latency
[ https://issues.apache.org/jira/browse/YARN-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-4485: --- Summary: [Umbrella] Capture per-application and per-queue container allocation latency (was: Capture per-application and per-queue container allocation latency) > [Umbrella] Capture per-application and per-queue container allocation latency > - > > Key: YARN-4485 > URL: https://issues.apache.org/jira/browse/YARN-4485 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.7.1 >Reporter: Karthik Kambatla > Labels: supportability, tuning > > Per-application and per-queue container allocation latencies would go a long > way towards help with tuning scheduler queue configs. > This umbrella JIRA tracks adding these metrics. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4488) CapacityScheduler: Compute per-container allocation latency and roll up to get per-application and per-queue
Karthik Kambatla created YARN-4488: -- Summary: CapacityScheduler: Compute per-container allocation latency and roll up to get per-application and per-queue Key: YARN-4488 URL: https://issues.apache.org/jira/browse/YARN-4488 Project: Hadoop YARN Issue Type: Sub-task Reporter: Karthik Kambatla -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4487) FairScheduler: Compute per-container allocation latency and roll up to get per-application and per-queue
Karthik Kambatla created YARN-4487: -- Summary: FairScheduler: Compute per-container allocation latency and roll up to get per-application and per-queue Key: YARN-4487 URL: https://issues.apache.org/jira/browse/YARN-4487 Project: Hadoop YARN Issue Type: Sub-task Reporter: Karthik Kambatla Assignee: Karthik Kambatla -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4486) Add requestTime to ResourceRequest
Karthik Kambatla created YARN-4486: -- Summary: Add requestTime to ResourceRequest Key: YARN-4486 URL: https://issues.apache.org/jira/browse/YARN-4486 Project: Hadoop YARN Issue Type: Sub-task Reporter: Karthik Kambatla Assignee: Karthik Kambatla Add a field requestTime to ResourceRequest. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4485) Capture per-application and per-queue container allocation latency
[ https://issues.apache.org/jira/browse/YARN-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065253#comment-15065253 ] Karthik Kambatla commented on YARN-4485: One potential approach is to add requestTime to ResourceRequest and use it to compute the difference at allocation time to get per-container allocation latency. This could be optionally rolled up at application and queue level. > Capture per-application and per-queue container allocation latency > -- > > Key: YARN-4485 > URL: https://issues.apache.org/jira/browse/YARN-4485 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.7.1 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > Labels: supportability, tuning > > Per-application and per-queue container allocation latencies would go a long > way towards help with tuning scheduler queue configs. > This umbrella JIRA tracks adding these metrics. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4485) Capture per-application and per-queue container allocation latency
Karthik Kambatla created YARN-4485: -- Summary: Capture per-application and per-queue container allocation latency Key: YARN-4485 URL: https://issues.apache.org/jira/browse/YARN-4485 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.7.1 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Per-application and per-queue container allocation latencies would go a long way towards help with tuning scheduler queue configs. This umbrella JIRA tracks adding these metrics. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4032) Corrupted state from a previous version can still cause RM to fail with NPE due to same reasons as YARN-2834
[ https://issues.apache.org/jira/browse/YARN-4032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065249#comment-15065249 ] Karthik Kambatla commented on YARN-4032: [~jianhe] - are you working on this? If not, I would like to take this up. > Corrupted state from a previous version can still cause RM to fail with NPE > due to same reasons as YARN-2834 > > > Key: YARN-4032 > URL: https://issues.apache.org/jira/browse/YARN-4032 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.1 >Reporter: Anubhav Dhoot >Assignee: Jian He >Priority: Critical > Attachments: YARN-4032.prelim.patch > > > YARN-2834 ensures in 2.6.0 there will not be any inconsistent state. But if > someone is upgrading from a previous version, the state can still be > inconsistent and then RM will still fail with NPE after upgrade to 2.6.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4290) "yarn nodes -list" should print all nodes reports information
[ https://issues.apache.org/jira/browse/YARN-4290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065243#comment-15065243 ] Hadoop QA commented on YARN-4290: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 16s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 21s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 22s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 10s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 28s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 39s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 26s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 23s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 24s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 11s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 30s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 18s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 51s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 39s {color} | {color:red} hadoop-yarn-client in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 40s {color} | {color:red} hadoop-yarn-client in the patch failed with JDK v1.7.0_91. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 23s {color} | {color:red} Patch generated 1 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 146m 56s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_66 Failed junit tests | hadoop.yarn.client.TestGetGroups | | JDK v1.8.0_66 Timed out junit tests | org.apache.hadoop.yarn.client.cli.TestYarnCLI | | | org.apache.hadoop.yarn.client.api.impl.TestYarnClient | | | org.apache.hadoop.yarn.client.api.impl.TestAMRMClient | | | org.apache.hadoop.yarn.client.api.impl.TestNMClient | | JDK v1.7.0_91 Failed junit tests | hadoop.yarn.client.TestGetGroups | | JDK v1.7.0_91 Timed out junit tests | org.apache.hadoop.yarn.client.cli.TestYarnCLI | | | org.apache.hadoop.yarn.client.api.impl.TestYarnClient | | | org.apache.hadoop.yarn.client.api.impl.TestAMRMClient | | | org.apache.hadoop.yarn.cli
[jira] [Commented] (YARN-4480) Clean up some inappropriate imports
[ https://issues.apache.org/jira/browse/YARN-4480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065242#comment-15065242 ] Hadoop QA commented on YARN-4480: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 4s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 59s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 34s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 3s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 27s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 29s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 3s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 31s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 17s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 20s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 14s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 9m 14s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 36s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 9m 36s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 3s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 28s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 29s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 19s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 29s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 16s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 67m 22s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 58s {color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 67m 27s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 26s {color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with JDK v1.7.0_91. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 26s {color} | {color:red} Patch generated 1 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 223m 5s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_66 Failed junit tests | hadoop.hdfs.TestReadStripedFileWithMissingBlocks | | | hadoop.hdfs.web.TestWebHDFS | | | hadoop.hdfs.server.datanode.TestDataNodeMetrics | | | hadoop.hdfs.server.datanode.TestBlockScanner | | | hadoop.hdfs.server.namenode.ha.TestSeveralNameNodes | | JDK v1.7.0_91 Faile
[jira] [Commented] (YARN-110) AM releases too many containers due to the protocol
[ https://issues.apache.org/jira/browse/YARN-110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065236#comment-15065236 ] Karthik Kambatla commented on YARN-110: --- Got it. See the value in fixing it. Proposals on how to? > AM releases too many containers due to the protocol > --- > > Key: YARN-110 > URL: https://issues.apache.org/jira/browse/YARN-110 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, scheduler >Reporter: Arun C Murthy >Assignee: Arun C Murthy > Attachments: YARN-110.patch > > > - AM sends request asking 4 containers on host H1. > - Asynchronously, host H1 reaches RM and gets assigned 4 containers. RM at > this point, sets the value against H1 to > zero in its aggregate request-table for all apps. > - In the mean-while AM gets to need 3 more containers, so a total of 7 > including the 4 from previous request. > - Today, AM sends the absolute number of 7 against H1 to RM as part of its > request table. > - RM seems to be overriding its earlier value of zero against H1 to 7 against > H1. And thus allocating 7 more > containers. > - AM already gets 4 in this scheduling iteration, but gets 7 more, a total of > 11 instead of the required 7. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4476) Matcher for complex node label expresions
[ https://issues.apache.org/jira/browse/YARN-4476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065233#comment-15065233 ] Hadoop QA commented on YARN-4476: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 3 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 49s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 13s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 12s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 33s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 30s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 12s {color} | {color:red} Patch generated 121 new checkstyle issues in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager (total was 0, now 121). {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 20s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 22s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 2m 47s {color} | {color:red} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_91 with JDK v1.7.0_91 generated 2 new issues (was 2, now 4). {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 59m 2s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 60m 28s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_91. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 24s {color} | {color:red} Patch generated 1 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 137m 28s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_66 Failed junit tests | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | | hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler | | | hadoop.yarn.server.resour
[jira] [Commented] (YARN-4476) Matcher for complex node label expresions
[ https://issues.apache.org/jira/browse/YARN-4476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065210#comment-15065210 ] Chris Douglas commented on YARN-4476: - bq. do you think is it better to place this module to org.apache.hadoop.yarn.server.resourcemanager.nodelabels.matcher (or evaluator) for better organization? I thought about it, but: # It's a (potential) internal detail of the node label implementation, with other classes in the package # The {{nodelabel}} package is sparse right now # None of these classes are user-facing, so they're easy to move So I put in in the {{nodelabels}} package, but don't have a strong opinion. > Matcher for complex node label expresions > - > > Key: YARN-4476 > URL: https://issues.apache.org/jira/browse/YARN-4476 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Reporter: Chris Douglas >Assignee: Chris Douglas > Attachments: YARN-4476-0.patch, YARN-4476-1.patch > > > Implementation of a matcher for complex node label expressions based on a > [paper|http://dl.acm.org/citation.cfm?id=1807171] from SIGMOD 2010. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (YARN-4476) Matcher for complex node label expresions
[ https://issues.apache.org/jira/browse/YARN-4476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065210#comment-15065210 ] Chris Douglas edited comment on YARN-4476 at 12/19/15 3:40 AM: --- bq. do you think is it better to place this module to org.apache.hadoop.yarn.server.resourcemanager.nodelabels.matcher (or evaluator) for better organization? I thought about it, but: # It's a (potential) internal detail of the node label implementation, with other classes in the package # The {{nodelabels}} package is sparse right now # None of these classes are user-facing, so they're easy to move So I put in in the {{nodelabels}} package, but don't have a strong opinion. was (Author: chris.douglas): bq. do you think is it better to place this module to org.apache.hadoop.yarn.server.resourcemanager.nodelabels.matcher (or evaluator) for better organization? I thought about it, but: # It's a (potential) internal detail of the node label implementation, with other classes in the package # The {{nodelabel}} package is sparse right now # None of these classes are user-facing, so they're easy to move So I put in in the {{nodelabels}} package, but don't have a strong opinion. > Matcher for complex node label expresions > - > > Key: YARN-4476 > URL: https://issues.apache.org/jira/browse/YARN-4476 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Reporter: Chris Douglas >Assignee: Chris Douglas > Attachments: YARN-4476-0.patch, YARN-4476-1.patch > > > Implementation of a matcher for complex node label expressions based on a > [paper|http://dl.acm.org/citation.cfm?id=1807171] from SIGMOD 2010. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4350) TestDistributedShell fails for V2 scenarios
[ https://issues.apache.org/jira/browse/YARN-4350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065209#comment-15065209 ] Naganarasimha G R commented on YARN-4350: - Yes even i was thinking about addiing this issue in it, though the issue initially been mentioned that jira (YARN-4385?) is not reproduce able i can have this issue there and try to fix it ! > TestDistributedShell fails for V2 scenarios > --- > > Key: YARN-4350 > URL: https://issues.apache.org/jira/browse/YARN-4350 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Naganarasimha G R > Attachments: YARN-4350-feature-YARN-2928.001.patch, > YARN-4350-feature-YARN-2928.002.patch, YARN-4350-feature-YARN-2928.003.patch > > > Currently TestDistributedShell does not pass on the feature-YARN-2928 branch. > There seem to be 2 distinct issues. > (1) testDSShellWithoutDomainV2* tests fail sporadically > These test fail more often than not if tested by themselves: > {noformat} > testDSShellWithoutDomainV2DefaultFlow(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) > Time elapsed: 30.998 sec <<< FAILURE! > java.lang.AssertionError: Application created event should be published > atleast once expected:<1> but was:<0> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.checkTimelineV2(TestDistributedShell.java:451) > at > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:326) > at > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithoutDomainV2DefaultFlow(TestDistributedShell.java:207) > {noformat} > They start happening after YARN-4129. I suspect this might have to do with > some timing issue. > (2) the whole test times out > If you run the whole TestDistributedShell test, it times out without fail. > This may or may not have to do with the port change introduced by YARN-2859 > (just a hunch). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4476) Matcher for complex node label expresions
[ https://issues.apache.org/jira/browse/YARN-4476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065195#comment-15065195 ] Wangda Tan commented on YARN-4476: -- Hi [~chris.douglas], It will be very useful to evaluate node label expressions, thanks! And do you think is it better to place this module to org.apache.hadoop.yarn.server.resourcemanager.nodelabels.matcher (or evaluator) for better organization? > Matcher for complex node label expresions > - > > Key: YARN-4476 > URL: https://issues.apache.org/jira/browse/YARN-4476 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Reporter: Chris Douglas >Assignee: Chris Douglas > Attachments: YARN-4476-0.patch, YARN-4476-1.patch > > > Implementation of a matcher for complex node label expressions based on a > [paper|http://dl.acm.org/citation.cfm?id=1807171] from SIGMOD 2010. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4304) AM max resource configuration per partition to be displayed/updated correctly in UI and in various partition related metrics
[ https://issues.apache.org/jira/browse/YARN-4304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065191#comment-15065191 ] Wangda Tan commented on YARN-4304: -- [~sunilg], bq. I was slightly confused by earlier comment. Ideally we can make use of users, but we may need to get the first user and get his AM Limit. This is perfectly fine for now until we have per-user-am-limit. Agree! And I think if there's no users in the queue, we can use queue's am-limit directly (instead of "N/A" or 0, etc.). > AM max resource configuration per partition to be displayed/updated correctly > in UI and in various partition related metrics > > > Key: YARN-4304 > URL: https://issues.apache.org/jira/browse/YARN-4304 > Project: Hadoop YARN > Issue Type: Sub-task > Components: webapp >Affects Versions: 2.7.1 >Reporter: Sunil G >Assignee: Sunil G > Attachments: 0001-YARN-4304.patch, 0002-YARN-4304.patch, > 0003-YARN-4304.patch, 0004-YARN-4304.patch, 0005-YARN-4304.patch, > 0005-YARN-4304.patch, REST_and_UI.zip > > > As we are supporting per-partition level max AM resource percentage > configuration, UI and various metrics also need to display correct > configurations related to same. > For eg: Current UI still shows am-resource percentage per queue level. This > is to be updated correctly when label config is used. > - Display max-am-percentage per-partition in Scheduler UI (label also) and in > ClusterMetrics page > - Update queue/partition related metrics w.r.t per-partition > am-resource-percentage -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4195) Support of node-labels in the ReservationSystem "Plan"
[ https://issues.apache.org/jira/browse/YARN-4195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065189#comment-15065189 ] Wangda Tan commented on YARN-4195: -- Hi [~curino], Thanks for explanation, I can understand the proposal now. When there are more than one property of nodes need to be shared by queues with specified capacity, this proposal will be very useful. For example, if both of the PUBLICIP and GPU are all required to be shared by queues with specified capacity. And I'm also thinking about how user to configure the cluster when this feature is enabled: 1. User added partition A/B 2. User configure capacity for partition = \{A_B, A, B and \} to queues. 3. User assign A/B partition to nodes 4. Submit job with partitions (2/3 could be swapped) But if user doesn't configure capacity for A_B and: - Submit a job with A_B, what should we do? - Assign A/B to one node, what should we do? And a cluster with N different "atomic" partitions could produce N^N "actual" partitions, how can we avoid such dimension explosion happens? If any of the property doesn't need guaranteed capacity for sharing, we can make it to be a simple constraint (YARN-3409), which will be FCFS and arbitrary combination of constraints could be supported. > Support of node-labels in the ReservationSystem "Plan" > -- > > Key: YARN-4195 > URL: https://issues.apache.org/jira/browse/YARN-4195 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Carlo Curino >Assignee: Carlo Curino > Attachments: YARN-4195.patch > > > As part of YARN-4193 we need to enhance the InMemoryPlan (and related > classes) to track the per-label available resources, as well as the per-label > reservation-allocations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4481) negative pending resource of queues lead to applications in accepted status inifnitly
[ https://issues.apache.org/jira/browse/YARN-4481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065186#comment-15065186 ] Varun Saxena commented on YARN-4481: [~sunilg], we do not have AM debug logs. And RM debug logs are after the event so all we get from it is that pending resources are negative which leads to the log guchi mentioned above. Let us see if we get something more from code. > negative pending resource of queues lead to applications in accepted status > inifnitly > - > > Key: YARN-4481 > URL: https://issues.apache.org/jira/browse/YARN-4481 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Affects Versions: 2.7.2 >Reporter: gu-chi >Priority: Critical > Attachments: jmx.txt > > > Met a scenario of negative pending resource with capacity scheduler, in jmx, > it shows: > {noformat} > "PendingMB" : -4096, > "PendingVCores" : -1, > "PendingContainers" : -1, > {noformat} > full jmx infomation attached. > this is not just a jmx UI issue, the actual pending resource of queue is also > negative as I see the debug log of > bq. DEBUG | ResourceManager Event Processor | Skip this queue=root, because > it doesn't need more resource, schedulingMode=RESPECT_PARTITION_EXCLUSIVITY > node-partition= | ParentQueue.java > this lead to the {{NULL_ASSIGNMENT}} > The background is submitting hundreds of applications and consume all cluster > resource and reservation happen. While running, network fault injected by > some tool, injection types are delay,jitter > ,repeat,packet loss and disorder. And then kill most of the applications > submitted. > Anyone also facing negative pending resource, or have idea of how this happen? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4481) negative pending resource of queues lead to applications in accepted status inifnitly
[ https://issues.apache.org/jira/browse/YARN-4481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065181#comment-15065181 ] gu-chi commented on YARN-4481: -- I added some extra log to trace, do you have any idea how can probably reproduce? > negative pending resource of queues lead to applications in accepted status > inifnitly > - > > Key: YARN-4481 > URL: https://issues.apache.org/jira/browse/YARN-4481 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Affects Versions: 2.7.2 >Reporter: gu-chi >Priority: Critical > Attachments: jmx.txt > > > Met a scenario of negative pending resource with capacity scheduler, in jmx, > it shows: > {noformat} > "PendingMB" : -4096, > "PendingVCores" : -1, > "PendingContainers" : -1, > {noformat} > full jmx infomation attached. > this is not just a jmx UI issue, the actual pending resource of queue is also > negative as I see the debug log of > bq. DEBUG | ResourceManager Event Processor | Skip this queue=root, because > it doesn't need more resource, schedulingMode=RESPECT_PARTITION_EXCLUSIVITY > node-partition= | ParentQueue.java > this lead to the {{NULL_ASSIGNMENT}} > The background is submitting hundreds of applications and consume all cluster > resource and reservation happen. While running, network fault injected by > some tool, injection types are delay,jitter > ,repeat,packet loss and disorder. And then kill most of the applications > submitted. > Anyone also facing negative pending resource, or have idea of how this happen? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4477) FairScheduler: encounter infinite loop in attemptScheduling
[ https://issues.apache.org/jira/browse/YARN-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065177#comment-15065177 ] Tao Jie commented on YARN-4477: --- Failed test cases are irrelevant to this patch, and work in my local environment. > FairScheduler: encounter infinite loop in attemptScheduling > --- > > Key: YARN-4477 > URL: https://issues.apache.org/jira/browse/YARN-4477 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Reporter: Tao Jie >Assignee: Tao Jie > Attachments: YARN-4477.001.patch, YARN-4477.002.patch, > YARN-4477.003.patch > > > This problem is introduced by YARN-4270 which add limitation on reservation. > In FSAppAttempt.reserve(): > {code} > if (!reservationExceedsThreshold(node, type)) { > LOG.info("Making reservation: node=" + node.getNodeName() + > " app_id=" + getApplicationId()); > if (!alreadyReserved) { > getMetrics().reserveResource(getUser(), container.getResource()); > RMContainer rmContainer = > super.reserve(node, priority, null, container); > node.reserveResource(this, priority, rmContainer); > setReservation(node); > } else { > RMContainer rmContainer = node.getReservedContainer(); > super.reserve(node, priority, rmContainer, container); > node.reserveResource(this, priority, rmContainer); > setReservation(node); > } > } > {code} > If reservation over threshod, current node will not set reservation. > But in attemptScheduling in FairSheduler: > {code} > while (node.getReservedContainer() == null) { > boolean assignedContainer = false; > if (!queueMgr.getRootQueue().assignContainer(node).equals( > Resources.none())) { > assignedContainers++; > assignedContainer = true; > > } > > if (!assignedContainer) { break; } > if (!assignMultiple) { break; } > if ((assignedContainers >= maxAssign) && (maxAssign > 0)) { break; } > } > {code} > assignContainer(node) still return FairScheduler.CONTAINER_RESERVED, which not > equals to Resources.none(). > As a result, if multiple assign is enabled and maxAssign is unlimited, this > while loop would never break. > I suppose that assignContainer(node) should return Resource.none rather than > CONTAINER_RESERVED when the attempt doesn't take the reservation because of > the limitation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4481) negative pending resource of queues lead to applications in accepted status inifnitly
[ https://issues.apache.org/jira/browse/YARN-4481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065175#comment-15065175 ] gu-chi commented on YARN-4481: -- Same using DRC. :( Debug Log was only enabled after I saw the issue, so before that, no debug infomation. I got RM log, but several GB with hundreds applications. > negative pending resource of queues lead to applications in accepted status > inifnitly > - > > Key: YARN-4481 > URL: https://issues.apache.org/jira/browse/YARN-4481 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Affects Versions: 2.7.2 >Reporter: gu-chi >Priority: Critical > Attachments: jmx.txt > > > Met a scenario of negative pending resource with capacity scheduler, in jmx, > it shows: > {noformat} > "PendingMB" : -4096, > "PendingVCores" : -1, > "PendingContainers" : -1, > {noformat} > full jmx infomation attached. > this is not just a jmx UI issue, the actual pending resource of queue is also > negative as I see the debug log of > bq. DEBUG | ResourceManager Event Processor | Skip this queue=root, because > it doesn't need more resource, schedulingMode=RESPECT_PARTITION_EXCLUSIVITY > node-partition= | ParentQueue.java > this lead to the {{NULL_ASSIGNMENT}} > The background is submitting hundreds of applications and consume all cluster > resource and reservation happen. While running, network fault injected by > some tool, injection types are delay,jitter > ,repeat,packet loss and disorder. And then kill most of the applications > submitted. > Anyone also facing negative pending resource, or have idea of how this happen? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4238) createdTime and modifiedTime is not reported while publishing entities to ATSv2
[ https://issues.apache.org/jira/browse/YARN-4238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065168#comment-15065168 ] Varun Saxena commented on YARN-4238: bq. If its in this way, then will it be useful for the client ? may be even it will be difficult for the users to understand it ? Yeah that is what I wanted to highlight. That specifically for metrics, even the cell timestamps are not really timestamps for Put. > createdTime and modifiedTime is not reported while publishing entities to > ATSv2 > --- > > Key: YARN-4238 > URL: https://issues.apache.org/jira/browse/YARN-4238 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-2928-1st-milestone > Attachments: YARN-4238-YARN-2928.01.patch, > YARN-4238-feature-YARN-2928.002.patch, YARN-4238-feature-YARN-2928.003.patch, > YARN-4238-feature-YARN-2928.02.patch > > > While publishing entities from RM and elsewhere we are not sending created > time. For instance, created time in TimelineServiceV2Publisher class and for > other entities in other such similar classes is not updated. We can easily > update created time when sending application created event. Likewise for > modification time on every write. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4290) "yarn nodes -list" should print all nodes reports information
[ https://issues.apache.org/jira/browse/YARN-4290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-4290: -- Attachment: 0003-YARN-4290.patch Thanks [~leftnoteasy] and [~Naganarasimha Garla] Yes, one test case is related. Uploading a new patch. Other test case failures are knows and tracked via separate tickets. > "yarn nodes -list" should print all nodes reports information > - > > Key: YARN-4290 > URL: https://issues.apache.org/jira/browse/YARN-4290 > Project: Hadoop YARN > Issue Type: Improvement > Components: client >Reporter: Wangda Tan >Assignee: Sunil G > Attachments: 0002-YARN-4290.patch, 0003-YARN-4290.patch > > > Currently, "yarn nodes -list" command only shows > - "Node-Id", > - "Node-State", > - "Node-Http-Address", > - "Number-of-Running-Containers" > I think we need to show more information such as used resource, just like > "yarn nodes -status" command. > Maybe we can add a parameter to -list, such as "-show-details" to enable > printing all detailed information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4238) createdTime and modifiedTime is not reported while publishing entities to ATSv2
[ https://issues.apache.org/jira/browse/YARN-4238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065167#comment-15065167 ] Naganarasimha G R commented on YARN-4238: - Thanks [~varun_saxena] for summarizing it, Me and varun synced offline to on this and i feel the summary is fine. and yes [~sjlee0], even i feel your opinion is right we can just keep it as part of entity object avoid having it as part of filter and complicate it. bq. The cell timestamps for metrics are filled based on what 's reported from client If its in this way, then will it be useful for the client ? may be even it will be difficult for the users to understand it ? > createdTime and modifiedTime is not reported while publishing entities to > ATSv2 > --- > > Key: YARN-4238 > URL: https://issues.apache.org/jira/browse/YARN-4238 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-2928-1st-milestone > Attachments: YARN-4238-YARN-2928.01.patch, > YARN-4238-feature-YARN-2928.002.patch, YARN-4238-feature-YARN-2928.003.patch, > YARN-4238-feature-YARN-2928.02.patch > > > While publishing entities from RM and elsewhere we are not sending created > time. For instance, created time in TimelineServiceV2Publisher class and for > other entities in other such similar classes is not updated. We can easily > update created time when sending application created event. Likewise for > modification time on every write. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4164) Retrospect update ApplicationPriority API return type
[ https://issues.apache.org/jira/browse/YARN-4164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065162#comment-15065162 ] Rohith Sharma K S commented on YARN-4164: - Thanks [~jianhe] for review and committing patch > Retrospect update ApplicationPriority API return type > - > > Key: YARN-4164 > URL: https://issues.apache.org/jira/browse/YARN-4164 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Fix For: 2.8.0 > > Attachments: 0001-YARN-4164.patch, 0002-YARN-4164.patch, > 0003-YARN-4164.patch, 0004-YARN-4164.patch > > > Currently {{ApplicationClientProtocol#updateApplicationPriority()}} API > returns empty UpdateApplicationPriorityResponse response. > But RM update priority to the cluster.max-priority if the given priority is > greater than cluster.max-priority. In this scenarios, need to intimate back > to client that updated priority rather just keeping quite where client > assumes that given priority itself is taken. > During application submission also has same scenario can happen, but I feel > when > explicitly invoke via ApplicationClientProtocol#updateApplicationPriority(), > response should have updated priority in response. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4238) createdTime and modifiedTime is not reported while publishing entities to ATSv2
[ https://issues.apache.org/jira/browse/YARN-4238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065160#comment-15065160 ] Varun Saxena commented on YARN-4238: [~sjlee0], if modified time is only for debugging, is there any need to filter rows based on it ? Maybe we can remove it. We can simply fill it on the basis of cell timestamps then and return it back in response. One thing though. The cell timestamps for metrics are filled based on what 's reported from client > createdTime and modifiedTime is not reported while publishing entities to > ATSv2 > --- > > Key: YARN-4238 > URL: https://issues.apache.org/jira/browse/YARN-4238 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-2928-1st-milestone > Attachments: YARN-4238-YARN-2928.01.patch, > YARN-4238-feature-YARN-2928.002.patch, YARN-4238-feature-YARN-2928.003.patch, > YARN-4238-feature-YARN-2928.02.patch > > > While publishing entities from RM and elsewhere we are not sending created > time. For instance, created time in TimelineServiceV2Publisher class and for > other entities in other such similar classes is not updated. We can easily > update created time when sending application created event. Likewise for > modification time on every write. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4476) Matcher for complex node label expresions
[ https://issues.apache.org/jira/browse/YARN-4476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated YARN-4476: Attachment: YARN-4476-1.patch Add ASF license headers, fix findbugs warnings, address some of the checkstyle issues. > Matcher for complex node label expresions > - > > Key: YARN-4476 > URL: https://issues.apache.org/jira/browse/YARN-4476 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Reporter: Chris Douglas >Assignee: Chris Douglas > Attachments: YARN-4476-0.patch, YARN-4476-1.patch > > > Implementation of a matcher for complex node label expressions based on a > [paper|http://dl.acm.org/citation.cfm?id=1807171] from SIGMOD 2010. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4484) Available Resource calculation for a queue is not correct when used with labels
Sunil G created YARN-4484: - Summary: Available Resource calculation for a queue is not correct when used with labels Key: YARN-4484 URL: https://issues.apache.org/jira/browse/YARN-4484 Project: Hadoop YARN Issue Type: Sub-task Components: capacity scheduler Affects Versions: 2.7.1 Reporter: Sunil G Assignee: Sunil G To calculate available resource for a queue, we have to get the total resource allocated for all labels in queue compare to its usage. Also address the comments given in [YARN-4304-comments|https://issues.apache.org/jira/browse/YARN-4304?focusedCommentId=15064874&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15064874 ] given by [~leftnoteasy] for same. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4304) AM max resource configuration per partition to be displayed/updated correctly in UI and in various partition related metrics
[ https://issues.apache.org/jira/browse/YARN-4304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065147#comment-15065147 ] Sunil G commented on YARN-4304: --- bq.I think the longer term fix should be, add a by-partition info to queue metrics, including max/guaranteed/available/used, etc. I can help to review proposal/patches. Yes. This looks fine for me. I will track this with a different JIRA. This new ticket will track the cluster metrics total memory when used with labels. bq.How about call them calculateAndGetAMResourceLimitPerPartition and getAMResourceLimitPerPartition? +1. Since we have this calculated information, we can make use of same. I will make the changes. I will upload REST/UI screen shots along with updated patch bq.Is there any concern of you for this approach? I was slightly confused by earlier comment. Ideally we can make use of {{users}}, but we may need to get the first user and get his AM Limit. This is perfectly fine for now until we have per-user-am-limit. > AM max resource configuration per partition to be displayed/updated correctly > in UI and in various partition related metrics > > > Key: YARN-4304 > URL: https://issues.apache.org/jira/browse/YARN-4304 > Project: Hadoop YARN > Issue Type: Sub-task > Components: webapp >Affects Versions: 2.7.1 >Reporter: Sunil G >Assignee: Sunil G > Attachments: 0001-YARN-4304.patch, 0002-YARN-4304.patch, > 0003-YARN-4304.patch, 0004-YARN-4304.patch, 0005-YARN-4304.patch, > 0005-YARN-4304.patch, REST_and_UI.zip > > > As we are supporting per-partition level max AM resource percentage > configuration, UI and various metrics also need to display correct > configurations related to same. > For eg: Current UI still shows am-resource percentage per queue level. This > is to be updated correctly when label config is used. > - Display max-am-percentage per-partition in Scheduler UI (label also) and in > ClusterMetrics page > - Update queue/partition related metrics w.r.t per-partition > am-resource-percentage -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-914) (Umbrella) Support graceful decommission of nodemanager
[ https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065119#comment-15065119 ] Karthik Kambatla commented on YARN-914: --- bq. On the other hand, there are additional details and component level designs that the JIRA design document not necessarily discuss or touch. Are you able to share these details in an "augmented" design doc? Agreeing on the design would greatly help with review/commits later. As far as implementation goes, it is recommended to create subtasks as you see fit. Note that it is easier to review smaller chunks of code. Also, since you guys have implemented it already, can you comment on how much of the code changes are in frequently updated parts? If not much, it might make sense to develop on a branch and merge it to trunk. > (Umbrella) Support graceful decommission of nodemanager > --- > > Key: YARN-914 > URL: https://issues.apache.org/jira/browse/YARN-914 > Project: Hadoop YARN > Issue Type: Improvement > Components: graceful >Affects Versions: 2.0.4-alpha >Reporter: Luke Lu >Assignee: Junping Du > Attachments: Gracefully Decommission of NodeManager (v1).pdf, > Gracefully Decommission of NodeManager (v2).pdf, > GracefullyDecommissionofNodeManagerv3.pdf > > > When NMs are decommissioned for non-fault reasons (capacity change etc.), > it's desirable to minimize the impact to running applications. > Currently if a NM is decommissioned, all running containers on the NM need to > be rescheduled on other NMs. Further more, for finished map tasks, if their > map output are not fetched by the reducers of the job, these map tasks will > need to be rerun as well. > We propose to introduce a mechanism to optionally gracefully decommission a > node manager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4480) Clean up some inappropriate imports
[ https://issues.apache.org/jira/browse/YARN-4480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kai Zheng updated YARN-4480: Attachment: YARN-4480-v2.patch Thanks Daniel. I manually edit it and uploaded a new version. > Clean up some inappropriate imports > --- > > Key: YARN-4480 > URL: https://issues.apache.org/jira/browse/YARN-4480 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Kai Zheng > Attachments: YARN-4480-v1.patch, YARN-4480-v2.patch > > > It was noticed there are some unnecessary dependency into Directory classes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4292) ResourceUtilization should be a part of NodeInfo REST API
[ https://issues.apache.org/jira/browse/YARN-4292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065077#comment-15065077 ] Wangda Tan commented on YARN-4292: -- Committed to branch-2.8. > ResourceUtilization should be a part of NodeInfo REST API > - > > Key: YARN-4292 > URL: https://issues.apache.org/jira/browse/YARN-4292 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Sunil G > Fix For: 2.8.0 > > Attachments: 0001-YARN-4292.patch, 0002-YARN-4292.patch, > 0003-YARN-4292.patch, 0004-YARN-4292.patch, 0005-YARN-4292.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4398) Yarn recover functionality causes the cluster running slowly and the cluster usage rate is far below 100
[ https://issues.apache.org/jira/browse/YARN-4398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065078#comment-15065078 ] Wangda Tan commented on YARN-4398: -- Committed to branch-2.8. > Yarn recover functionality causes the cluster running slowly and the cluster > usage rate is far below 100 > > > Key: YARN-4398 > URL: https://issues.apache.org/jira/browse/YARN-4398 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: NING DING >Assignee: NING DING > Fix For: 2.7.3 > > Attachments: YARN-4398.2.patch, YARN-4398.3.patch, YARN-4398.4.patch > > > In my hadoop cluster, the resourceManager recover functionality is enabled > with FileSystemRMStateStore. > I found this cause the yarn cluster running slowly and cluster usage rate is > just 50 even there are many pending Apps. > The scenario is below. > In thread A, the RMAppImpl$RMAppNewlySavingTransition is calling > storeNewApplication method defined in RMStateStore. This storeNewApplication > method is synchronized. > {code:title=RMAppImpl.java|borderStyle=solid} > private static final class RMAppNewlySavingTransition extends > RMAppTransition { > @Override > public void transition(RMAppImpl app, RMAppEvent event) { > // If recovery is enabled then store the application information in a > // non-blocking call so make sure that RM has stored the information > // needed to restart the AM after RM restart without further client > // communication > LOG.info("Storing application with id " + app.applicationId); > app.rmContext.getStateStore().storeNewApplication(app); > } > } > {code} > {code:title=RMStateStore.java|borderStyle=solid} > public synchronized void storeNewApplication(RMApp app) { > ApplicationSubmissionContext context = app > > .getApplicationSubmissionContext(); > assert context instanceof ApplicationSubmissionContextPBImpl; > ApplicationStateData appState = > ApplicationStateData.newInstance( > app.getSubmitTime(), app.getStartTime(), context, app.getUser()); > dispatcher.getEventHandler().handle(new RMStateStoreAppEvent(appState)); > } > {code} > In thread B, the FileSystemRMStateStore is calling > storeApplicationStateInternal method. It's also synchronized. > This storeApplicationStateInternal method saves an ApplicationStateData into > HDFS and it normally costs 90~300 milliseconds in my hadoop cluster. > {code:title=FileSystemRMStateStore.java|borderStyle=solid} > public synchronized void storeApplicationStateInternal(ApplicationId appId, > ApplicationStateData appStateDataPB) throws Exception { > Path appDirPath = getAppDir(rmAppRoot, appId); > mkdirsWithRetries(appDirPath); > Path nodeCreatePath = getNodePath(appDirPath, appId.toString()); > LOG.info("Storing info for app: " + appId + " at: " + nodeCreatePath); > byte[] appStateData = appStateDataPB.getProto().toByteArray(); > try { > // currently throw all exceptions. May need to respond differently for > HA > // based on whether we have lost the right to write to FS > writeFileWithRetries(nodeCreatePath, appStateData, true); > } catch (Exception e) { > LOG.info("Error storing info for app: " + appId, e); > throw e; > } > } > {code} > Think thread B firstly comes into > FileSystemRMStateStore.storeApplicationStateInternal method, then thread A > will be blocked for a while because of synchronization. In ResourceManager > there is only one RMStateStore instance. In my cluster it's > FileSystemRMStateStore type. > Debug the RMAppNewlySavingTransition.transition method, the thread stack > shows it's called form AsyncDispatcher.dispatch method. This method code is > as below. > {code:title=AsyncDispatcher.java|borderStyle=solid} > protected void dispatch(Event event) { > //all events go thru this loop > if (LOG.isDebugEnabled()) { > LOG.debug("Dispatching the event " + event.getClass().getName() + "." > + event.toString()); > } > Class type = event.getType().getDeclaringClass(); > try{ > EventHandler handler = eventDispatchers.get(type); > if(handler != null) { > handler.handle(event); > } else { > throw new Exception("No handler for registered for " + type); > } > } catch (Throwable t) { > //TODO Maybe log the state of the queue > LOG.fatal("Error in dispatcher thread", t); > // If serviceStop is called, we should exit this thread gracefully. > if (exitOnDispatchException > && (ShutdownHookManager.get().isShutdownIn
[jira] [Commented] (YARN-4405) Support node label store in non-appendable file system
[ https://issues.apache.org/jira/browse/YARN-4405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065072#comment-15065072 ] Wangda Tan commented on YARN-4405: -- Committed to branch-2.8. > Support node label store in non-appendable file system > -- > > Key: YARN-4405 > URL: https://issues.apache.org/jira/browse/YARN-4405 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, client, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Fix For: 2.8.0 > > Attachments: YARN-4405.1.patch, YARN-4405.2.patch, YARN-4405.3.patch, > YARN-4405.4.patch > > > Existing node label file system store implementation uses append to write > edit logs. However, some file system doesn't support append, we need add an > implementation to support such non-appendable file systems as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4422) Generic AHS sometimes doesn't show started, node, or logs on App page
[ https://issues.apache.org/jira/browse/YARN-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065065#comment-15065065 ] Wangda Tan commented on YARN-4422: -- Committed to branch-2.8. > Generic AHS sometimes doesn't show started, node, or logs on App page > - > > Key: YARN-4422 > URL: https://issues.apache.org/jira/browse/YARN-4422 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Eric Payne >Assignee: Eric Payne > Fix For: 3.0.0, 2.8.0, 2.7.3 > > Attachments: AppAttemptPage no container or node.jpg, AppPage no logs > or node.jpg, YARN-4422.001.patch > > > Sometimes the AM container for an app isn't able to start the JVM. This can > happen if bogus JVM options are given to the AM container ( > {{-Dyarn.app.mapreduce.am.command-opts=-InvalidJvmOption}}) or when > misconfiguring the AM container's environment variables > ({{-Dyarn.app.mapreduce.am.env="JAVA_HOME=/foo/bar/baz}}) > When the AM container for an app isn't able to start the JVM, the Application > page for that application shows {{N/A}} for the {{Started}}, {{Node}}, and > {{Logs}} columns. It _does_ have links for each app attempt, and if you click > on one of them, you go to the Application Attempt page, where you can see all > containers with links to their logs and nodes, including the AM container. > But none of that shows up for the app attempts on the Application page. > Also, on the Application Attempt page, in the {{Application Attempt > Overview}} section, the {{AM Container}} value is {{null}} and the {{Node}} > value is {{N/A}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4424) Fix deadlock in RMAppImpl
[ https://issues.apache.org/jira/browse/YARN-4424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065063#comment-15065063 ] Wangda Tan commented on YARN-4424: -- Committed to branch-2.8. > Fix deadlock in RMAppImpl > - > > Key: YARN-4424 > URL: https://issues.apache.org/jira/browse/YARN-4424 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yesha Vora >Assignee: Jian He >Priority: Blocker > Fix For: 2.7.2, 2.6.3 > > Attachments: YARN-4424.1.patch > > > {code} > yarn@XXX:/mnt/hadoopqe$ /usr/hdp/current/hadoop-yarn-client/bin/yarn > application -list -appStates NEW,NEW_SAVING,SUBMITTED,ACCEPTED,RUNNING > 15/12/04 21:59:54 INFO impl.TimelineClientImpl: Timeline service address: > http://XXX:8188/ws/v1/timeline/ > 15/12/04 21:59:54 INFO client.RMProxy: Connecting to ResourceManager at > XXX/0.0.0.0:8050 > 15/12/04 21:59:55 INFO client.AHSProxy: Connecting to Application History > server at XXX/0.0.0.0:10200 > {code} > {code:title=RM log} > 2015-12-04 21:59:19,744 INFO event.AsyncDispatcher > (AsyncDispatcher.java:handle(243)) - Size of event-queue is 237000 > 2015-12-04 22:00:50,945 INFO event.AsyncDispatcher > (AsyncDispatcher.java:handle(243)) - Size of event-queue is 238000 > 2015-12-04 22:02:22,416 INFO event.AsyncDispatcher > (AsyncDispatcher.java:handle(243)) - Size of event-queue is 239000 > 2015-12-04 22:03:53,593 INFO event.AsyncDispatcher > (AsyncDispatcher.java:handle(243)) - Size of event-queue is 24 > 2015-12-04 22:05:24,856 INFO event.AsyncDispatcher > (AsyncDispatcher.java:handle(243)) - Size of event-queue is 241000 > 2015-12-04 22:06:56,235 INFO event.AsyncDispatcher > (AsyncDispatcher.java:handle(243)) - Size of event-queue is 242000 > 2015-12-04 22:08:27,510 INFO event.AsyncDispatcher > (AsyncDispatcher.java:handle(243)) - Size of event-queue is 243000 > 2015-12-04 22:09:58,786 INFO event.AsyncDispatcher > (AsyncDispatcher.java:handle(243)) - Size of event-queue is 244000 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4440) FSAppAttempt#getAllowedLocalityLevelByTime should init the lastScheduler time
[ https://issues.apache.org/jira/browse/YARN-4440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065066#comment-15065066 ] zhihai xu commented on YARN-4440: - yes, thanks [~leftnoteasy] for committing it to branch-2.8! > FSAppAttempt#getAllowedLocalityLevelByTime should init the lastScheduler time > - > > Key: YARN-4440 > URL: https://issues.apache.org/jira/browse/YARN-4440 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.7.1 >Reporter: Lin Yiqun >Assignee: Lin Yiqun > Fix For: 2.8.0 > > Attachments: YARN-4440.001.patch, YARN-4440.002.patch, > YARN-4440.003.patch > > > It seems there is a bug on {{FSAppAttempt#getAllowedLocalityLevelByTime}} > method > {code} > // default level is NODE_LOCAL > if (! allowedLocalityLevel.containsKey(priority)) { > allowedLocalityLevel.put(priority, NodeType.NODE_LOCAL); > return NodeType.NODE_LOCAL; > } > {code} > If you first invoke this method, it doesn't init time in > lastScheduledContainer and this will lead to execute these code for next > invokation: > {code} > // check waiting time > long waitTime = currentTimeMs; > if (lastScheduledContainer.containsKey(priority)) { > waitTime -= lastScheduledContainer.get(priority); > } else { > waitTime -= getStartTime(); > } > {code} > the waitTime will subtract to FsApp startTime, and this will be easily more > than the delay time and allowedLocality degrade. Because FsApp startTime will > be start earlier than currentTimeMs. So we should add the initial time of > priority to prevent comparing with FsApp startTime and allowedLocalityLevel > degrade. And this problem will have more negative influence for small-jobs. > The YARN-4399 also discuss some problem in aspect of locality. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4392) ApplicationCreatedEvent event time resets after RM restart/failover
[ https://issues.apache.org/jira/browse/YARN-4392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065067#comment-15065067 ] Wangda Tan commented on YARN-4392: -- Committed to branch-2.8. > ApplicationCreatedEvent event time resets after RM restart/failover > --- > > Key: YARN-4392 > URL: https://issues.apache.org/jira/browse/YARN-4392 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: Xuan Gong >Assignee: Naganarasimha G R >Priority: Critical > Fix For: 2.8.0 > > Attachments: YARN-4392-2015-11-24.patch, YARN-4392.1.patch, > YARN-4392.2.patch, YARN-4392.3.patch > > > {code}2015-09-01 12:39:09,852 WARN util.Times (Times.java:elapsed(53)) - > Finished time 1437453994768 is ahead of started time 1440308399674 > 2015-09-01 12:39:09,852 WARN util.Times (Times.java:elapsed(53)) - Finished > time 1437454008244 is ahead of started time 1440308399676 > 2015-09-01 12:39:09,852 WARN util.Times (Times.java:elapsed(53)) - Finished > time 1437444305171 is ahead of started time 1440308399653 > 2015-09-01 12:39:09,852 WARN util.Times (Times.java:elapsed(53)) - Finished > time 1437444293115 is ahead of started time 1440308399647 > 2015-09-01 12:39:09,852 WARN util.Times (Times.java:elapsed(53)) - Finished > time 1437444379645 is ahead of started time 1440308399656 > 2015-09-01 12:39:09,852 WARN util.Times (Times.java:elapsed(53)) - Finished > time 1437444361234 is ahead of started time 1440308399655 > 2015-09-01 12:39:09,852 WARN util.Times (Times.java:elapsed(53)) - Finished > time 1437444342029 is ahead of started time 1440308399654 > 2015-09-01 12:39:09,852 WARN util.Times (Times.java:elapsed(53)) - Finished > time 1437444323447 is ahead of started time 1440308399654 > 2015-09-01 12:39:09,853 WARN util.Times (Times.java:elapsed(53)) - Finished > time 143730006 is ahead of started time 1440308399660 > 2015-09-01 12:39:09,853 WARN util.Times (Times.java:elapsed(53)) - Finished > time 143715698 is ahead of started time 1440308399659 > 2015-09-01 12:39:09,853 WARN util.Times (Times.java:elapsed(53)) - Finished > time 143719060 is ahead of started time 1440308399658 > 2015-09-01 12:39:09,853 WARN util.Times (Times.java:elapsed(53)) - Finished > time 1437444393931 is ahead of started time 1440308399657 > {code} . > From ATS logs, we would see a large amount of 'stale alerts' messages > periodically -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4418) AM Resource Limit per partition can be updated to ResourceUsage as well
[ https://issues.apache.org/jira/browse/YARN-4418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065058#comment-15065058 ] Wangda Tan commented on YARN-4418: -- Committed to branch-2.8. > AM Resource Limit per partition can be updated to ResourceUsage as well > --- > > Key: YARN-4418 > URL: https://issues.apache.org/jira/browse/YARN-4418 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Sunil G >Assignee: Sunil G > Fix For: 2.8.0 > > Attachments: 0001-YARN-4418.patch, 0002-YARN-4418.patch, > 0003-YARN-4418.patch, 0004-YARN-4418.patch, 0005-YARN-4418.patch > > > AMResourceLimit is now extended to all partitions after YARN-3216. Its also > better to track this ResourceLimit in existing {{ResourceUsage}} so that REST > framework can be benefited to avail this information easily. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3946) Update exact reason as to why a submitted app is in ACCEPTED state to app's diagnostic message
[ https://issues.apache.org/jira/browse/YARN-3946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065060#comment-15065060 ] Wangda Tan commented on YARN-3946: -- Committed to branch-2.8. > Update exact reason as to why a submitted app is in ACCEPTED state to app's > diagnostic message > -- > > Key: YARN-3946 > URL: https://issues.apache.org/jira/browse/YARN-3946 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler, resourcemanager >Affects Versions: 2.6.0 >Reporter: Sumit Nigam >Assignee: Naganarasimha G R > Fix For: 2.8.0 > > Attachments: 3946WebImages.zip, YARN-3946.v1.001.patch, > YARN-3946.v1.002.patch, YARN-3946.v1.003.Images.zip, YARN-3946.v1.003.patch, > YARN-3946.v1.004.patch, YARN-3946.v1.005.patch, YARN-3946.v1.006.patch, > YARN-3946.v1.007.patch, YARN-3946.v1.008.patch > > > Currently there is no direct way to get the exact reason as to why a > submitted app is still in ACCEPTED state. It should be possible to know > through RM REST API as to what aspect is not being met - say, queue limits > being reached, or core/ memory requirement not being met, or AM limit being > reached, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4309) Add container launch related debug information to container logs when a container fails
[ https://issues.apache.org/jira/browse/YARN-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065059#comment-15065059 ] Wangda Tan commented on YARN-4309: -- Committed to branch-2.8. > Add container launch related debug information to container logs when a > container fails > --- > > Key: YARN-4309 > URL: https://issues.apache.org/jira/browse/YARN-4309 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Fix For: 2.8.0 > > Attachments: YARN-4309.001.patch, YARN-4309.002.patch, > YARN-4309.003.patch, YARN-4309.004.patch, YARN-4309.005.patch, > YARN-4309.006.patch, YARN-4309.007.patch, YARN-4309.008.patch, > YARN-4309.009.patch, YARN-4309.010.patch > > > Sometimes when a container fails, it can be pretty hard to figure out why it > failed. > My proposal is that if a container fails, we collect information about the > container local dir and dump it into the container log dir. Ideally, I'd like > to tar up the directory entirely, but I'm not sure of the security and space > implications of such a approach. At the very least, we can list all the files > in the container local dir, and dump the contents of launch_container.sh(into > the container log dir). > When log aggregation occurs, all this information will automatically get > collected and make debugging such failures much easier. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4416) Deadlock due to synchronised get Methods in AbstractCSQueue
[ https://issues.apache.org/jira/browse/YARN-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065053#comment-15065053 ] Wangda Tan commented on YARN-4416: -- Committed to branch-2.8. > Deadlock due to synchronised get Methods in AbstractCSQueue > --- > > Key: YARN-4416 > URL: https://issues.apache.org/jira/browse/YARN-4416 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler, resourcemanager >Affects Versions: 2.7.1 >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Minor > Fix For: 2.8.0 > > Attachments: YARN-4416.v1.001.patch, YARN-4416.v1.002.patch, > YARN-4416.v2.001.patch, YARN-4416.v2.002.patch, YARN-4416.v2.003.patch, > deadlock.log > > > While debugging in eclipse came across a scenario where in i had to get to > know the name of the queue but every time i tried to see the queue it was > getting hung. On seeing the stack realized there was a deadlock but on > analysis found out that it was only due to *queue.toString()* during > debugging as {{AbstractCSQueue.getAbsoluteUsedCapacity}} was synchronized. > Hence we need to ensure following : > # queueCapacity, resource-usage has their own read/write lock hence > synchronization is not req > # numContainers is volatile hence synchronization is not req. > # read/write lock could be added to Ordering Policy. Read operations don't > need synchronized. So {{getNumApplications}} doesn't need synchronized. > (First 2 will be handled in this jira and the third will be handled in > YARN-4443) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4225) Add preemption status to yarn queue -status for capacity scheduler
[ https://issues.apache.org/jira/browse/YARN-4225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065054#comment-15065054 ] Wangda Tan commented on YARN-4225: -- Committed to branch-2.8. > Add preemption status to yarn queue -status for capacity scheduler > -- > > Key: YARN-4225 > URL: https://issues.apache.org/jira/browse/YARN-4225 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, yarn >Affects Versions: 2.7.1 >Reporter: Eric Payne >Assignee: Eric Payne >Priority: Minor > Fix For: 2.8.0 > > Attachments: YARN-4225.001.patch, YARN-4225.002.patch, > YARN-4225.003.patch, YARN-4225.004.patch, YARN-4225.005.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4440) FSAppAttempt#getAllowedLocalityLevelByTime should init the lastScheduler time
[ https://issues.apache.org/jira/browse/YARN-4440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065057#comment-15065057 ] Wangda Tan commented on YARN-4440: -- Committed to branch-2.8. > FSAppAttempt#getAllowedLocalityLevelByTime should init the lastScheduler time > - > > Key: YARN-4440 > URL: https://issues.apache.org/jira/browse/YARN-4440 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.7.1 >Reporter: Lin Yiqun >Assignee: Lin Yiqun > Fix For: 2.8.0 > > Attachments: YARN-4440.001.patch, YARN-4440.002.patch, > YARN-4440.003.patch > > > It seems there is a bug on {{FSAppAttempt#getAllowedLocalityLevelByTime}} > method > {code} > // default level is NODE_LOCAL > if (! allowedLocalityLevel.containsKey(priority)) { > allowedLocalityLevel.put(priority, NodeType.NODE_LOCAL); > return NodeType.NODE_LOCAL; > } > {code} > If you first invoke this method, it doesn't init time in > lastScheduledContainer and this will lead to execute these code for next > invokation: > {code} > // check waiting time > long waitTime = currentTimeMs; > if (lastScheduledContainer.containsKey(priority)) { > waitTime -= lastScheduledContainer.get(priority); > } else { > waitTime -= getStartTime(); > } > {code} > the waitTime will subtract to FsApp startTime, and this will be easily more > than the delay time and allowedLocality degrade. Because FsApp startTime will > be start earlier than currentTimeMs. So we should add the initial time of > priority to prevent comparing with FsApp startTime and allowedLocalityLevel > degrade. And this problem will have more negative influence for small-jobs. > The YARN-4399 also discuss some problem in aspect of locality. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4293) ResourceUtilization should be a part of yarn node CLI
[ https://issues.apache.org/jira/browse/YARN-4293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065055#comment-15065055 ] Wangda Tan commented on YARN-4293: -- Committed to branch-2.8. > ResourceUtilization should be a part of yarn node CLI > - > > Key: YARN-4293 > URL: https://issues.apache.org/jira/browse/YARN-4293 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Sunil G > Fix For: 2.8.0 > > Attachments: 0001-YARN-4293.patch, 0002-YARN-4293.patch, > 0003-YARN-4293.patch > > > In order to get resource utilization information easier, "yarn node" CLI > should include resource utilization on the node. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3480) Recovery may get very slow with lots of services with lots of app-attempts
[ https://issues.apache.org/jira/browse/YARN-3480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065043#comment-15065043 ] Jun Gong commented on YARN-3480: Thanks for review and suggestion! {quote} regarding this logic, it is possible that a particular attempt is not persisted in the store because of some connection failures. so the app.nextAttemptId - app.firstAttemptIdInStateStore does not necessarily indicate the number of attempts. {quote} If RMStateStore fails to persist any attempt, it will transition to state 'RMStateStoreState.FENCED'. There will be no operations performed if RMStateStore is in this state. So it will not be a problem? {quote} LevelDBRMStateStore#removeApplicationAttemptInternal does not need to use batch operation, as it only has one operation Could you also add a test case in RMStateStoreTestBase#testRMAppStateStore that the loading part also works correctly? i.e. loading an app with partial attempts works correctly. {quote} Thanks, I will fix them. > Recovery may get very slow with lots of services with lots of app-attempts > -- > > Key: YARN-3480 > URL: https://issues.apache.org/jira/browse/YARN-3480 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Jun Gong >Assignee: Jun Gong > Attachments: YARN-3480.01.patch, YARN-3480.02.patch, > YARN-3480.03.patch, YARN-3480.04.patch, YARN-3480.05.patch, > YARN-3480.06.patch, YARN-3480.07.patch, YARN-3480.08.patch, > YARN-3480.09.patch, YARN-3480.10.patch > > > When RM HA is enabled and running containers are kept across attempts, apps > are more likely to finish successfully with more retries(attempts), so it > will be better to set 'yarn.resourcemanager.am.max-attempts' larger. However > it will make RMStateStore(FileSystem/HDFS/ZK) store more attempts, and make > RM recover process much slower. It might be better to set max attempts to be > stored in RMStateStore. > BTW: When 'attemptFailuresValidityInterval'(introduced in YARN-611) is set to > a small value, retried attempts might be very large. So we need to delete > some attempts stored in RMStateStore and RMStateStore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4405) Support node label store in non-appendable file system
[ https://issues.apache.org/jira/browse/YARN-4405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-4405: -- Fix Version/s: 2.8.0 > Support node label store in non-appendable file system > -- > > Key: YARN-4405 > URL: https://issues.apache.org/jira/browse/YARN-4405 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, client, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Fix For: 2.8.0 > > Attachments: YARN-4405.1.patch, YARN-4405.2.patch, YARN-4405.3.patch, > YARN-4405.4.patch > > > Existing node label file system store implementation uses append to write > edit logs. However, some file system doesn't support append, we need add an > implementation to support such non-appendable file systems as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4195) Support of node-labels in the ReservationSystem "Plan"
[ https://issues.apache.org/jira/browse/YARN-4195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15064956#comment-15064956 ] Carlo Curino commented on YARN-4195: [~leftnoteasy], I think you are guessing right (with a wrong example)... The user during job/reservation submission can say {{GPU}} and internally the system will translate this in {{GPU_PUBLICIP OR GPU_NOT-PUBLICIP}} and thus match any container from either of the two underlying partitions. Even nicer would be to allow each node to carry an arbitrary set of labels ({{GPU}}, {{PUBLICIP}}), and the system automatically infer partitions (from node label specification). A configuration helper tool could show the Admin the list partitions, and their capacity and help configure queues by specifying capacity allocations per-partition (or per-label with some validation happening behind the scene). As the number of "active" partitions (vs the number of all possible partitions) is typically much smaller (and bound by the number of nodes), this should be generally feasible. Speaking with [~kasha], this would also go very well with some of the ideas for a schedule refactoring / support for node labels in the {{FairScheduler}} he is considering. > Support of node-labels in the ReservationSystem "Plan" > -- > > Key: YARN-4195 > URL: https://issues.apache.org/jira/browse/YARN-4195 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Carlo Curino >Assignee: Carlo Curino > Attachments: YARN-4195.patch > > > As part of YARN-4193 we need to enhance the InMemoryPlan (and related > classes) to track the per-label available resources, as well as the per-label > reservation-allocations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4224) Change the ATSv2 reader side REST interface to conform to current REST APIs' in YARN
[ https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15064931#comment-15064931 ] Sangjin Lee commented on YARN-4224: --- Thanks [~gtCarrera9] for the update! Then I'd like to put forward a proposal more formally (it's not a new proposal). - adopt Li's original proposal (2nd approach mentioned in [this comment|https://issues.apache.org/jira/browse/YARN-4224?focusedCommentId=15052865&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15052865]) - permit omitting part of the path that can be omitted (need documentation on the permitted cases) - also support a UID-based URL as a shorthand for the long path-based URL, but *clearly document what type of queries support UIDs* I am still not 100% convinced that we should not make composing UIDs public (so that clients themselves can compose them, instead of a server-based end point). This is my proposal/opinion, so obviously yours may be different. Thoughts? Comments? > Change the ATSv2 reader side REST interface to conform to current REST APIs' > in YARN > > > Key: YARN-4224 > URL: https://issues.apache.org/jira/browse/YARN-4224 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-2928-1st-milestone > Attachments: YARN-4224-YARN-2928.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4164) Retrospect update ApplicationPriority API return type
[ https://issues.apache.org/jira/browse/YARN-4164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15064915#comment-15064915 ] Hudson commented on YARN-4164: -- SUCCESS: Integrated in Hadoop-trunk-Commit #8999 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8999/]) YARN-4164. Changed updateApplicationPriority API to return the updated (jianhe: rev 85c24660481f33684a42a7f6d665d3117577c780) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/YarnClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/ApplicationCLI.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/YarnClientImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_service_protos.proto * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ResourceMgrDelegate.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/UpdateApplicationPriorityResponse.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/UpdateApplicationPriorityResponsePBImpl.java > Retrospect update ApplicationPriority API return type > - > > Key: YARN-4164 > URL: https://issues.apache.org/jira/browse/YARN-4164 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Fix For: 2.8.0 > > Attachments: 0001-YARN-4164.patch, 0002-YARN-4164.patch, > 0003-YARN-4164.patch, 0004-YARN-4164.patch > > > Currently {{ApplicationClientProtocol#updateApplicationPriority()}} API > returns empty UpdateApplicationPriorityResponse response. > But RM update priority to the cluster.max-priority if the given priority is > greater than cluster.max-priority. In this scenarios, need to intimate back > to client that updated priority rather just keeping quite where client > assumes that given priority itself is taken. > During application submission also has same scenario can happen, but I feel > when > explicitly invoke via ApplicationClientProtocol#updateApplicationPriority(), > response should have updated priority in response. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4234) New put APIs in TimelineClient for ats v1.5
[ https://issues.apache.org/jira/browse/YARN-4234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15064917#comment-15064917 ] Hadoop QA commented on YARN-4234: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 4 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 7s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 10s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 19s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 29s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 31s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 40s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 31s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 28s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 56s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 23s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 9s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 9s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 21s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 21s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 31s {color} | {color:red} Patch generated 12 new checkstyle issues in hadoop-yarn-project/hadoop-yarn (total was 237, now 246). {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 29s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 41s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 55s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 29s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 55s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 25s {color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 6s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 54s {color} | {color:green} hadoop-yarn-server-applicationhistoryservice in the patch passed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 25s {color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 20s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m 7s {color} | {color:green} hadoop-yarn-server-applicationhistoryservice in the patch passed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 23s {color} | {color:green} Patch does not gen
[jira] [Commented] (YARN-4238) createdTime and modifiedTime is not reported while publishing entities to ATSv2
[ https://issues.apache.org/jira/browse/YARN-4238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15064910#comment-15064910 ] Sangjin Lee commented on YARN-4238: --- Thanks [~varun_saxena] for the summary, and [~Naganarasimha] for bringing up the matter. To elaborate on a couple of points, although the modified time was introduced early in the data model, I don't think we had things like queries based on the modified time in mind. It was suggested more for the usefulness in terms of troubleshooting ("when was the entity last written to?"), but not much more. One analogy may be the often-present modified timestamp columns in SQL tables. Other than that, I'm generally agreeing with Varun's summary. At minimum, the modified timestamp should not be set by clients. And we should probably drop the modified time from the schema in general. We can leave it in the data model, but I'm still not 100% sure if we even need to bother with that. I'd like to hear your thoughts. > createdTime and modifiedTime is not reported while publishing entities to > ATSv2 > --- > > Key: YARN-4238 > URL: https://issues.apache.org/jira/browse/YARN-4238 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-2928-1st-milestone > Attachments: YARN-4238-YARN-2928.01.patch, > YARN-4238-feature-YARN-2928.002.patch, YARN-4238-feature-YARN-2928.003.patch, > YARN-4238-feature-YARN-2928.02.patch > > > While publishing entities from RM and elsewhere we are not sending created > time. For instance, created time in TimelineServiceV2Publisher class and for > other entities in other such similar classes is not updated. We can easily > update created time when sending application created event. Likewise for > modification time on every write. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-4483) HDP 2.2.9 expands the scope of AMBARI-11358.
[ https://issues.apache.org/jira/browse/YARN-4483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan resolved YARN-4483. -- Resolution: Invalid [~dmtucker], AMBARI-11358 should handle this already, I think there's nothing needs to be done in YARN side. Closing as invalid. > HDP 2.2.9 expands the scope of AMBARI-11358. > > > Key: YARN-4483 > URL: https://issues.apache.org/jira/browse/YARN-4483 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 > Environment: HDP 2.2.9.0-3393, Ambari 1.7.0 >Reporter: David Tucker > > YARN will not start until > yarn.scheduler.capacity.root.accessible-node-labels.default.capacity and > yarn.scheduler.capacity.root.accessible-node-labels.default.maximum-capacity > are changed from their default value (-1). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4257) Move scheduler validateConf method to AbstractYarnScheduler and make it protected
[ https://issues.apache.org/jira/browse/YARN-4257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15064886#comment-15064886 ] Wangda Tan commented on YARN-4257: -- Hi [~rhaase], Thanks for working on this patch, Patch looks good to me, could you make the "minimum allocation should > 0" test to a paramterized test? So we don't have to duplicate it for 3 different schedulers. Maybe we could add it to TestAbstractYarnScheduler? > Move scheduler validateConf method to AbstractYarnScheduler and make it > protected > - > > Key: YARN-4257 > URL: https://issues.apache.org/jira/browse/YARN-4257 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Reporter: Swapnil Daingade >Assignee: Rich Haase > Labels: easyfix > Attachments: YARN-4257.patch > > > Currently FairScheduler, CapacityScheduler and FifoScheduler each have a > method private void validateConf(Configuration conf). > All three methods validate the minimum and maximum scheduler allocations for > cpu and memory (with minor difference). FairScheduler supports 0 as minimum > allocation for cpu and memory, while CapacityScheduler and FifoScheduler do > not. We can move this code to AbstractYarnScheduler (avoids code duplication) > and make it protected for individual schedulers to override. > Why do we care about a minimum allocation of 0 for cpu and memory? > We contribute to a project called Apache Myriad that run yarn on mesos. > Myriad supports a feature call fine grained scaling (fgs). In fgs, a NM is > launched with zero capacity (0 cpu and 0 mem). When a yarn container is to be > run on the NM, a mesos offer for that node is accepted and the NM capacity is > dynamically scaled up to match the accepted mesos offer. On completion of the > yarn container, resources are returned back to Mesos and the NM capacity is > scaled down back to zero (cpu & mem). > In ResourceTrackerService.registerNodeManager, yarn checks if the NM capacity > is at-least as much as yarn.scheduler.minimum-allocation-mb and > yarn.scheduler.minimum-allocation-vcores. These values can be set to 0 in > yarn-site.xml (so a zero capacity NM is possible). However, the validateConf > methods in CapacityScheduler and FifoScheduler do not allow for 0 values for > these properties (The FairScheduler one does allow for 0). This behaviour > should be consistent or at-least be override able. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires
[ https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15064880#comment-15064880 ] Hadoop QA commented on YARN-4138: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 6 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 10m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 48s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 17s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 46s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 18s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 35s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 35s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 34s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 43s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 48s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 48s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 41s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 16s {color} | {color:red} Patch generated 2 new checkstyle issues in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager (total was 249, now 249). {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 46s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 19s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 43s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 34s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 36s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 47s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 42s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 27s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 157m 32s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_66 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | JDK v1.7.0_91 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12778547/YARN-4138.3
[jira] [Commented] (YARN-4304) AM max resource configuration per partition to be displayed/updated correctly in UI and in various partition related metrics
[ https://issues.apache.org/jira/browse/YARN-4304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15064874#comment-15064874 ] Wangda Tan commented on YARN-4304: -- Hi [~sunilg], 1) Changes for the max-available-to-a-queue may be separated to a separated patch. The major concern is - performance: For every allocated container, we need to iterate all labels to get a total resources. - I think the longer term fix should be, add a by-partition info to queue metrics, including max/guaranteed/available/used, etc. I can help to review proposal/patches. 2) There're several methods are using: {code} public synchronized Resource getAMResourceLimitPerPartition( String nodePartition) {code} I think after we have YARN-4418, we don't need to calculate AMResourceLimitPerPartition everytime. So I will suggest to split the method to calculate-and-get and read-only methods. How about call them calculateAndGetAMResourceLimitPerPartition and getAMResourceLimitPerPartition? {{getPendingAppDiagnosticMessage}}/REST-API will use read-only interface. Agree? 3) Could you upload screenshots/REST api responses? To your queston: bq. I am giving AM Resource limit for user AM limit, This is not really correct. But to get this, we need to have some round about way. I gave this implementation based on below comment. Is this what you also expect? Is it possible to use {{org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.CapacitySchedulerLeafQueueInfo#users}} to get AM Resource Limit instead? Is there any concern of you for this approach? I think using same queue's limit for queue and user may not be correct to me. > AM max resource configuration per partition to be displayed/updated correctly > in UI and in various partition related metrics > > > Key: YARN-4304 > URL: https://issues.apache.org/jira/browse/YARN-4304 > Project: Hadoop YARN > Issue Type: Sub-task > Components: webapp >Affects Versions: 2.7.1 >Reporter: Sunil G >Assignee: Sunil G > Attachments: 0001-YARN-4304.patch, 0002-YARN-4304.patch, > 0003-YARN-4304.patch, 0004-YARN-4304.patch, 0005-YARN-4304.patch, > 0005-YARN-4304.patch, REST_and_UI.zip > > > As we are supporting per-partition level max AM resource percentage > configuration, UI and various metrics also need to display correct > configurations related to same. > For eg: Current UI still shows am-resource percentage per queue level. This > is to be updated correctly when label config is used. > - Display max-am-percentage per-partition in Scheduler UI (label also) and in > ClusterMetrics page > - Update queue/partition related metrics w.r.t per-partition > am-resource-percentage -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4483) HDP 2.2.9 expands the scope of AMBARI-11358.
David Tucker created YARN-4483: -- Summary: HDP 2.2.9 expands the scope of AMBARI-11358. Key: YARN-4483 URL: https://issues.apache.org/jira/browse/YARN-4483 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Environment: HDP 2.2.9.0-3393, Ambari 1.7.0 Reporter: David Tucker YARN will not start until yarn.scheduler.capacity.root.accessible-node-labels.default.capacity and yarn.scheduler.capacity.root.accessible-node-labels.default.maximum-capacity are changed from their default value (-1). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4164) Retrospect update ApplicationPriority API return type
[ https://issues.apache.org/jira/browse/YARN-4164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15064854#comment-15064854 ] Jian He commented on YARN-4164: --- lgtm, +1 > Retrospect update ApplicationPriority API return type > - > > Key: YARN-4164 > URL: https://issues.apache.org/jira/browse/YARN-4164 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Attachments: 0001-YARN-4164.patch, 0002-YARN-4164.patch, > 0003-YARN-4164.patch, 0004-YARN-4164.patch > > > Currently {{ApplicationClientProtocol#updateApplicationPriority()}} API > returns empty UpdateApplicationPriorityResponse response. > But RM update priority to the cluster.max-priority if the given priority is > greater than cluster.max-priority. In this scenarios, need to intimate back > to client that updated priority rather just keeping quite where client > assumes that given priority itself is taken. > During application submission also has same scenario can happen, but I feel > when > explicitly invoke via ApplicationClientProtocol#updateApplicationPriority(), > response should have updated priority in response. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4234) New put APIs in TimelineClient for ats v1.5
[ https://issues.apache.org/jira/browse/YARN-4234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15064816#comment-15064816 ] Xuan Gong commented on YARN-4234: - upload a patch to fix the checkstyle issue > New put APIs in TimelineClient for ats v1.5 > --- > > Key: YARN-4234 > URL: https://issues.apache.org/jira/browse/YARN-4234 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-4234-2015-11-13.1.patch, > YARN-4234-2015-11-16.1.patch, YARN-4234-2015-11-16.2.patch, > YARN-4234-2015.2.patch, YARN-4234.1.patch, YARN-4234.2.patch, > YARN-4234.2015-11-12.1.patch, YARN-4234.2015-11-12.1.patch, > YARN-4234.2015-11-18.1.patch, YARN-4234.2015-11-18.2.patch, > YARN-4234.2015-11-18.patch, YARN-4234.2015-12-09.patch, > YARN-4234.2015-12-09.patch, YARN-4234.2015-12-17.1.patch, > YARN-4234.2015-12-18.1.patch, YARN-4234.2015-12-18.patch, > YARN-4234.20151109.patch, YARN-4234.20151110.1.patch, > YARN-4234.2015.1.patch, YARN-4234.3.patch > > > In this ticket, we will add new put APIs in timelineClient to let > clients/applications have the option to use ATS v1.5 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-4473) Add version information for the application and the application attempts
[ https://issues.apache.org/jira/browse/YARN-4473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Giovanni Matteo Fumarola reassigned YARN-4473: -- Assignee: Giovanni Matteo Fumarola (was: Marco Rabozzi) > Add version information for the application and the application attempts > > > Key: YARN-4473 > URL: https://issues.apache.org/jira/browse/YARN-4473 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Giovanni Matteo Fumarola >Assignee: Giovanni Matteo Fumarola > > In order to allow to upgrade an application master across different attempts, > we need to keep track of different attempts versions and provide a mean to > temporarily store the upgrade information until the upgrade completes. > Concretely we would add: > - A version identifier for each attempt > - A temporary upgrade context for each application -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4234) New put APIs in TimelineClient for ats v1.5
[ https://issues.apache.org/jira/browse/YARN-4234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-4234: Attachment: YARN-4234.2015-12-18.1.patch > New put APIs in TimelineClient for ats v1.5 > --- > > Key: YARN-4234 > URL: https://issues.apache.org/jira/browse/YARN-4234 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-4234-2015-11-13.1.patch, > YARN-4234-2015-11-16.1.patch, YARN-4234-2015-11-16.2.patch, > YARN-4234-2015.2.patch, YARN-4234.1.patch, YARN-4234.2.patch, > YARN-4234.2015-11-12.1.patch, YARN-4234.2015-11-12.1.patch, > YARN-4234.2015-11-18.1.patch, YARN-4234.2015-11-18.2.patch, > YARN-4234.2015-11-18.patch, YARN-4234.2015-12-09.patch, > YARN-4234.2015-12-09.patch, YARN-4234.2015-12-17.1.patch, > YARN-4234.2015-12-18.1.patch, YARN-4234.2015-12-18.patch, > YARN-4234.20151109.patch, YARN-4234.20151110.1.patch, > YARN-4234.2015.1.patch, YARN-4234.3.patch > > > In this ticket, we will add new put APIs in timelineClient to let > clients/applications have the option to use ATS v1.5 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4234) New put APIs in TimelineClient for ats v1.5
[ https://issues.apache.org/jira/browse/YARN-4234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-4234: Attachment: YARN-4234.2015-12-18.patch > New put APIs in TimelineClient for ats v1.5 > --- > > Key: YARN-4234 > URL: https://issues.apache.org/jira/browse/YARN-4234 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-4234-2015-11-13.1.patch, > YARN-4234-2015-11-16.1.patch, YARN-4234-2015-11-16.2.patch, > YARN-4234-2015.2.patch, YARN-4234.1.patch, YARN-4234.2.patch, > YARN-4234.2015-11-12.1.patch, YARN-4234.2015-11-12.1.patch, > YARN-4234.2015-11-18.1.patch, YARN-4234.2015-11-18.2.patch, > YARN-4234.2015-11-18.patch, YARN-4234.2015-12-09.patch, > YARN-4234.2015-12-09.patch, YARN-4234.2015-12-17.1.patch, > YARN-4234.2015-12-18.patch, YARN-4234.20151109.patch, > YARN-4234.20151110.1.patch, YARN-4234.2015.1.patch, YARN-4234.3.patch > > > In this ticket, we will add new put APIs in timelineClient to let > clients/applications have the option to use ATS v1.5 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4234) New put APIs in TimelineClient for ats v1.5
[ https://issues.apache.org/jira/browse/YARN-4234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-4234: Attachment: YARN-4234.2015-11-18.patch > New put APIs in TimelineClient for ats v1.5 > --- > > Key: YARN-4234 > URL: https://issues.apache.org/jira/browse/YARN-4234 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-4234-2015-11-13.1.patch, > YARN-4234-2015-11-16.1.patch, YARN-4234-2015-11-16.2.patch, > YARN-4234-2015.2.patch, YARN-4234.1.patch, YARN-4234.2.patch, > YARN-4234.2015-11-12.1.patch, YARN-4234.2015-11-12.1.patch, > YARN-4234.2015-11-18.1.patch, YARN-4234.2015-11-18.2.patch, > YARN-4234.2015-11-18.patch, YARN-4234.2015-12-09.patch, > YARN-4234.2015-12-09.patch, YARN-4234.2015-12-17.1.patch, > YARN-4234.20151109.patch, YARN-4234.20151110.1.patch, > YARN-4234.2015.1.patch, YARN-4234.3.patch > > > In this ticket, we will add new put APIs in timelineClient to let > clients/applications have the option to use ATS v1.5 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4195) Support of node-labels in the ReservationSystem "Plan"
[ https://issues.apache.org/jira/browse/YARN-4195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15064812#comment-15064812 ] Wangda Tan commented on YARN-4195: -- Hi [~curino], Thanks for working on this, I've took a look at your description and patch. Could you provide an example about GPU/PUBLICIP queue configuration and resource request *when the proposal completed*? I may not understand it competely, I guess it may look like: - Since the system still uses partition to track resources, there're still 4 partitions: GPU_PUBLICIP, GPU_NOT-PUBLICIP, NOT-GPU_PUBLICIP, NOT-GPU_NOT-PUBLICIP. Admin still needs to configure capacity of queues for the 4 partitions. - Node label expression of Resource Request will be updated to partition, let's say an expression = GPU, it becomes two resource requests internally: GPU_PUBLICIP and PUBLICIP. > Support of node-labels in the ReservationSystem "Plan" > -- > > Key: YARN-4195 > URL: https://issues.apache.org/jira/browse/YARN-4195 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Carlo Curino >Assignee: Carlo Curino > Attachments: YARN-4195.patch > > > As part of YARN-4193 we need to enhance the InMemoryPlan (and related > classes) to track the per-label available resources, as well as the per-label > reservation-allocations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3480) Recovery may get very slow with lots of services with lots of app-attempts
[ https://issues.apache.org/jira/browse/YARN-3480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15064794#comment-15064794 ] Jian He commented on YARN-3480: --- thanks for updating, - regarding this logic, it is possible that a particular attempt is not persisted in the store because of some connection failures. so the {{app.nextAttemptId - app.firstAttemptIdInStateStore}} does not necessarily indicate the number of attempts. {code} while (app.nextAttemptId - app.firstAttemptIdInStateStore
> app.maxAppAttempts) { {code} - LevelDBRMStateStore#removeApplicationAttemptInternal does not need to use batch operation, as it only has one operation - Could you also add a test case in RMStateStoreTestBase#testRMAppStateStore that the loading part also works correctly? i.e. loading an app with partial attempts works correctly. > Recovery may get very slow with lots of services with lots of app-attempts > -- > > Key: YARN-3480 > URL: https://issues.apache.org/jira/browse/YARN-3480 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Jun Gong >Assignee: Jun Gong > Attachments: YARN-3480.01.patch, YARN-3480.02.patch, > YARN-3480.03.patch, YARN-3480.04.patch, YARN-3480.05.patch, > YARN-3480.06.patch, YARN-3480.07.patch, YARN-3480.08.patch, > YARN-3480.09.patch, YARN-3480.10.patch > > > When RM HA is enabled and running containers are kept across attempts, apps > are more likely to finish successfully with more retries(attempts), so it > will be better to set 'yarn.resourcemanager.am.max-attempts' larger. However > it will make RMStateStore(FileSystem/HDFS/ZK) store more attempts, and make > RM recover process much slower. It might be better to set max attempts to be > stored in RMStateStore. > BTW: When 'attemptFailuresValidityInterval'(introduced in YARN-611) is set to > a small value, retried attempts might be very large. So we need to delete > some attempts stored in RMStateStore and RMStateStore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4290) "yarn nodes -list" should print all nodes reports information
[ https://issues.apache.org/jira/browse/YARN-4290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15064783#comment-15064783 ] Wangda Tan commented on YARN-4290: -- Thanks [~Naganarasimha], [~sunilg] could you check failed tests and I will commit in a few days. > "yarn nodes -list" should print all nodes reports information > - > > Key: YARN-4290 > URL: https://issues.apache.org/jira/browse/YARN-4290 > Project: Hadoop YARN > Issue Type: Improvement > Components: client >Reporter: Wangda Tan >Assignee: Sunil G > Attachments: 0002-YARN-4290.patch > > > Currently, "yarn nodes -list" command only shows > - "Node-Id", > - "Node-State", > - "Node-Http-Address", > - "Number-of-Running-Containers" > I think we need to show more information such as used resource, just like > "yarn nodes -status" command. > Maybe we can add a parameter to -list, such as "-show-details" to enable > printing all detailed information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4290) "yarn nodes -list" should print all nodes reports information
[ https://issues.apache.org/jira/browse/YARN-4290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15064774#comment-15064774 ] Hadoop QA commented on YARN-4290: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 48s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 16s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 18s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 8s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 23s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 38s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 14s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 21s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 15s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 15s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 18s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 18s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 8s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 24s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 40s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 14s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 19s {color} | {color:red} hadoop-yarn-client in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 31s {color} | {color:red} hadoop-yarn-client in the patch failed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 30s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 143m 23s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_66 Failed junit tests | hadoop.yarn.client.TestGetGroups | | JDK v1.8.0_66 Timed out junit tests | org.apache.hadoop.yarn.client.cli.TestYarnCLI | | | org.apache.hadoop.yarn.client.api.impl.TestAMRMClient | | | org.apache.hadoop.yarn.client.api.impl.TestYarnClient | | | org.apache.hadoop.yarn.client.api.impl.TestNMClient | | JDK v1.7.0_91 Failed junit tests | hadoop.yarn.client.TestGetGroups | | JDK v1.7.0_91 Timed out junit tests | org.apache.hadoop.yarn.client.cli.TestYarnCLI | | | org.apache.hadoop.yarn.client.api.impl.TestAMRMClient | | | org.apache.hadoop.yarn.client.api.impl.TestYarnClient | | | org.apache.ha
[jira] [Commented] (YARN-914) (Umbrella) Support graceful decommission of nodemanager
[ https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15064737#comment-15064737 ] Daniel Zhi commented on YARN-914: - Thanks. Always commit to trunk first makes lots of sense to me. We would need to port the code to trunk and likely build AMI image with it so to leverage our internal verification tests system. Our implementation is much in sync with the architecture and idea in the JIRA design document. On the other hand, there are additional details and component level designs that the JIRA design document not necessarily discuss or touch. These details naturally surfaced up during the development iterations and the corresponding design became matured and stabilized. One example is the DecommissioningNodeWatcher, which embedded in ResourceTrackingService, tracks DECOMMISSIONING nodes status automatically and asynchronously after client/admin made the graceful decommission request. Another example is per node decommission timeout support, which is useful to decommission node that will be terminated soon. > (Umbrella) Support graceful decommission of nodemanager > --- > > Key: YARN-914 > URL: https://issues.apache.org/jira/browse/YARN-914 > Project: Hadoop YARN > Issue Type: Improvement > Components: graceful >Affects Versions: 2.0.4-alpha >Reporter: Luke Lu >Assignee: Junping Du > Attachments: Gracefully Decommission of NodeManager (v1).pdf, > Gracefully Decommission of NodeManager (v2).pdf, > GracefullyDecommissionofNodeManagerv3.pdf > > > When NMs are decommissioned for non-fault reasons (capacity change etc.), > it's desirable to minimize the impact to running applications. > Currently if a NM is decommissioned, all running containers on the NM need to > be rescheduled on other NMs. Further more, for finished map tasks, if their > map output are not fetched by the reducers of the job, these map tasks will > need to be rerun as well. > We propose to introduce a mechanism to optionally gracefully decommission a > node manager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2934) Improve handling of container's stderr
[ https://issues.apache.org/jira/browse/YARN-2934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15064680#comment-15064680 ] Hadoop QA commented on YARN-2934: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 30s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 15s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 21s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 29s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 38s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 39s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 53s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 34s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 0s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 28s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 6s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 6s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 21s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 21s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 31s {color} | {color:red} Patch generated 2 new checkstyle issues in hadoop-yarn-project/hadoop-yarn (total was 279, now 279). {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 35s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 40s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s {color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 21s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 33s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 51s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 25s {color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 5s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 1s {color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 27s {color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 15s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 30s {color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with JDK v1.7.0_91. {color} | | {colo
[jira] [Updated] (YARN-4482) Default values of several config parameters are missing
[ https://issues.apache.org/jira/browse/YARN-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tianyin Xu updated YARN-4482: - Description: In {{yarn-default.xml}}, the default values of the following parameters are commented out, {{yarn.client.failover-max-attempts}} {{yarn.client.failover-sleep-base-ms}} {{yarn.client.failover-sleep-max-ms}} Are these default values changed (I suppose so)? If so, we should update the new ones in {{yarn-default.xml}}. Right now, I don't know the real "default" values... (yarn-default.xml) https://hadoop.apache.org/docs/r2.6.3/hadoop-yarn/hadoop-yarn-common/yarn-default.xml https://hadoop.apache.org/docs/r2.6.3/hadoop-yarn/hadoop-yarn-common/yarn-default.xml Thanks! was: In {{yarn-default.xml}}, the default values of the following parameters are commented out, {{yarn.client.failover-max-attempts}} {{yarn.client.failover-sleep-base-ms}} {{yarn.client.failover-sleep-max-ms}} Are these default values changed (I suppose so)? If so, we should update the new ones in {{yarn-default.xml}}. Right now, I don't know the real "default" values... Thanks! > Default values of several config parameters are missing > > > Key: YARN-4482 > URL: https://issues.apache.org/jira/browse/YARN-4482 > Project: Hadoop YARN > Issue Type: Improvement > Components: client >Affects Versions: 2.6.2, 2.6.3 >Reporter: Tianyin Xu >Priority: Minor > > In {{yarn-default.xml}}, the default values of the following parameters are > commented out, > {{yarn.client.failover-max-attempts}} > {{yarn.client.failover-sleep-base-ms}} > {{yarn.client.failover-sleep-max-ms}} > Are these default values changed (I suppose so)? If so, we should update the > new ones in {{yarn-default.xml}}. Right now, I don't know the real "default" > values... > (yarn-default.xml) > https://hadoop.apache.org/docs/r2.6.3/hadoop-yarn/hadoop-yarn-common/yarn-default.xml > https://hadoop.apache.org/docs/r2.6.3/hadoop-yarn/hadoop-yarn-common/yarn-default.xml > Thanks! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4482) Default values of several config parameters are missing
[ https://issues.apache.org/jira/browse/YARN-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tianyin Xu updated YARN-4482: - Description: In {{yarn-default.xml}}, the default values of the following parameters are commented out, {{yarn.client.failover-max-attempts}} {{yarn.client.failover-sleep-base-ms}} {{yarn.client.failover-sleep-max-ms}} Are these default values changed (I suppose so)? If so, we should update the new ones in {{yarn-default.xml}}. Right now, I don't know the real "default" values... (yarn-default.xml) https://hadoop.apache.org/docs/r2.6.2/hadoop-yarn/hadoop-yarn-common/yarn-default.xml https://hadoop.apache.org/docs/r2.6.3/hadoop-yarn/hadoop-yarn-common/yarn-default.xml Thanks! was: In {{yarn-default.xml}}, the default values of the following parameters are commented out, {{yarn.client.failover-max-attempts}} {{yarn.client.failover-sleep-base-ms}} {{yarn.client.failover-sleep-max-ms}} Are these default values changed (I suppose so)? If so, we should update the new ones in {{yarn-default.xml}}. Right now, I don't know the real "default" values... (yarn-default.xml) https://hadoop.apache.org/docs/r2.6.3/hadoop-yarn/hadoop-yarn-common/yarn-default.xml https://hadoop.apache.org/docs/r2.6.3/hadoop-yarn/hadoop-yarn-common/yarn-default.xml Thanks! > Default values of several config parameters are missing > > > Key: YARN-4482 > URL: https://issues.apache.org/jira/browse/YARN-4482 > Project: Hadoop YARN > Issue Type: Improvement > Components: client >Affects Versions: 2.6.2, 2.6.3 >Reporter: Tianyin Xu >Priority: Minor > > In {{yarn-default.xml}}, the default values of the following parameters are > commented out, > {{yarn.client.failover-max-attempts}} > {{yarn.client.failover-sleep-base-ms}} > {{yarn.client.failover-sleep-max-ms}} > Are these default values changed (I suppose so)? If so, we should update the > new ones in {{yarn-default.xml}}. Right now, I don't know the real "default" > values... > (yarn-default.xml) > https://hadoop.apache.org/docs/r2.6.2/hadoop-yarn/hadoop-yarn-common/yarn-default.xml > https://hadoop.apache.org/docs/r2.6.3/hadoop-yarn/hadoop-yarn-common/yarn-default.xml > Thanks! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4482) Default values of several config parameters are missing
Tianyin Xu created YARN-4482: Summary: Default values of several config parameters are missing Key: YARN-4482 URL: https://issues.apache.org/jira/browse/YARN-4482 Project: Hadoop YARN Issue Type: Improvement Components: client Affects Versions: 2.6.2, 2.6.3 Reporter: Tianyin Xu Priority: Minor In {{yarn-default.xml}}, the default values of the following parameters are commented out, {{yarn.client.failover-max-attempts}} {{yarn.client.failover-sleep-base-ms}} {{yarn.client.failover-sleep-max-ms}} Are these default values changed (I suppose so)? If so, we should update the new ones in {{yarn-default.xml}}. Right now, I don't know the real "default" values... Thanks! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4428) Redirect RM page to AHS page when AHS turned on and RM page is not avaialable
[ https://issues.apache.org/jira/browse/YARN-4428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15064569#comment-15064569 ] Hadoop QA commented on YARN-4428: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 57s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 11s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 20s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 21s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 3s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 28s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 4s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 42s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 52s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 58s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 11s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 11s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 22s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 6m 21s {color} | {color:red} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server-jdk1.7.0_91 with JDK v1.7.0_91 generated 1 new issues (was 7, now 7). {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 22s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 21s {color} | {color:red} Patch generated 2 new checkstyle issues in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server (total was 163, now 163). {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 4s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 28s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 21s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 41s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 52s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 25s {color} | {color:green} hadoop-yarn-server-common in the patch passed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 60m 28s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 29s {color} | {color:green} hadoop-yarn-server-common in the patch passed with JDK v1.7.0_91. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 61m 18s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 25s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 149m 36s
[jira] [Commented] (YARN-4454) NM to nodelabel mapping going wrong after RM restart
[ https://issues.apache.org/jira/browse/YARN-4454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15064513#comment-15064513 ] Wangda Tan commented on YARN-4454: -- [~bibinchundatt], thanks for reporting and looking at the issue. The root cause of this issue is, when the RM restart first time, it will generate a mirror file which has a complete node->label mappings: {code} node1:port=x node1=y {code} And when we restart the RM again, we will load the mapping, but node1:port loaded first, so node1=y will overwrite the previous one. In: {{org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager#checkReplaceLabelsOnNode}} Instead of directly iterate the map: {code} for (Entry> entry : replaceLabelsToNode.entrySet()) { NodeId nodeId = entry.getKey(); {code} We should sort the map so that the node without port should be handled first before node with port specified to avoid overwriting happens. Is it make sense to you? > NM to nodelabel mapping going wrong after RM restart > > > Key: YARN-4454 > URL: https://issues.apache.org/jira/browse/YARN-4454 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Critical > Attachments: test.patch > > > *Nodelabel mapping with NodeManager is going wrong if combination of > hostname and then NodeId is used to update nodelabel mapping* > *Steps to reproduce* > 1.Create cluster with 2 NM > 2.Add label X,Y to cluster > 3.replace Label of node 1 using ,x > 4.replace label for node 1 by ,y > 5.Again replace label of node 1 by ,x > Check cluster label mapping HOSTNAME1 will be mapped with X > Now restart RM 2 times NODE LABEL mapping of HOSTNAME1:PORT changes to Y > {noformat} > 2015-12-14 17:17:54,901 INFO > org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: Add labels: > [,] > 2015-12-14 17:17:54,905 INFO > org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: REPLACE labels on > nodes: > 2015-12-14 17:17:54,906 INFO > org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: > NM=host-10-19-92-188:64318, labels=[ResourcePool_1] > 2015-12-14 17:17:54,906 INFO > org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: > NM=host-10-19-92-188:0, labels=[ResourcePool_null] > 2015-12-14 17:17:54,906 INFO > org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: > NM=host-10-19-92-187:64318, labels=[ResourcePool_null] > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4138) Roll back container resource allocation after resource increase token expires
[ https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] MENG DING updated YARN-4138: Attachment: YARN-4138.3.patch Attach latest patch that addresses [~jianhe] and [~sandflee]'s comments. I think the issue brought up by [~jianhe] is about race conditions between a normal resource decrease and a resource rollback. The proposed fix is to guard resource rollback with the same sequence of locks as the normal resource decrease, i.e., lock on application first, then on scheduler. So with the proposed fix, we can walk through the original example: 1. AM asks increase 2G -> 8G, and is approved by RM 2. AM does not increase the container, AM asks to decrease to 1G, and in the same time, increase expiration logic is triggered: * If the normal decrease is processed first: RM decrease 8G -> 1G (allocated and lastConfirmed are now set to 1G), and then rollback is processed: RM rollback 1G -> 1G (skip) * If rollback is processed first: RM rollback 8G -> 2G (allocated and lastConfirmed are now set to 2G), and then normal decrease is processed: RM decrease 2G -> 1G > Roll back container resource allocation after resource increase token expires > - > > Key: YARN-4138 > URL: https://issues.apache.org/jira/browse/YARN-4138 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, nodemanager, resourcemanager >Reporter: MENG DING >Assignee: MENG DING > Attachments: YARN-4138-YARN-1197.1.patch, > YARN-4138-YARN-1197.2.patch, YARN-4138.3.patch > > > In YARN-1651, after container resource increase token expires, the running > container is killed. > This ticket will change the behavior such that when a container resource > increase token expires, the resource allocation of the container will be > reverted back to the value before the increase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4350) TestDistributedShell fails for V2 scenarios
[ https://issues.apache.org/jira/browse/YARN-4350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15064457#comment-15064457 ] Sangjin Lee commented on YARN-4350: --- Thanks for digging deep into this [~Naganarasimha]. There might be a little bigger issue with the non-test code then, and it might be not limited to our branch. I'm OK with fixing the issues that are specific to our branch here, and handle the bigger issue in a trunk JIRA (YARN-4385?). Thoughts? > TestDistributedShell fails for V2 scenarios > --- > > Key: YARN-4350 > URL: https://issues.apache.org/jira/browse/YARN-4350 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Naganarasimha G R > Attachments: YARN-4350-feature-YARN-2928.001.patch, > YARN-4350-feature-YARN-2928.002.patch, YARN-4350-feature-YARN-2928.003.patch > > > Currently TestDistributedShell does not pass on the feature-YARN-2928 branch. > There seem to be 2 distinct issues. > (1) testDSShellWithoutDomainV2* tests fail sporadically > These test fail more often than not if tested by themselves: > {noformat} > testDSShellWithoutDomainV2DefaultFlow(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) > Time elapsed: 30.998 sec <<< FAILURE! > java.lang.AssertionError: Application created event should be published > atleast once expected:<1> but was:<0> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.checkTimelineV2(TestDistributedShell.java:451) > at > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:326) > at > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithoutDomainV2DefaultFlow(TestDistributedShell.java:207) > {noformat} > They start happening after YARN-4129. I suspect this might have to do with > some timing issue. > (2) the whole test times out > If you run the whole TestDistributedShell test, it times out without fail. > This may or may not have to do with the port change introduced by YARN-2859 > (just a hunch). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4198) CapacityScheduler locking / synchronization improvements
[ https://issues.apache.org/jira/browse/YARN-4198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15064443#comment-15064443 ] Wangda Tan commented on YARN-4198: -- [~atumanov], Could you rebase your patch against latest trunk? Thanks, > CapacityScheduler locking / synchronization improvements > > > Key: YARN-4198 > URL: https://issues.apache.org/jira/browse/YARN-4198 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Carlo Curino >Assignee: Alexey Tumanov > Attachments: YARN-4198-v1.patch > > > In the context of YARN-4193 (which stresses the RM/CS performance) we found > several performance problems with in the locking/synchronization of the > CapacityScheduler, as well as inconsistencies that do not normally surface > (incorrect locking-order of queues protected by CS locks etc). This JIRA > proposes several refactoring that improve this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4290) "yarn nodes -list" should print all nodes reports information
[ https://issues.apache.org/jira/browse/YARN-4290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15064438#comment-15064438 ] Naganarasimha G R commented on YARN-4290: - +1 patch LGTM ! > "yarn nodes -list" should print all nodes reports information > - > > Key: YARN-4290 > URL: https://issues.apache.org/jira/browse/YARN-4290 > Project: Hadoop YARN > Issue Type: Improvement > Components: client >Reporter: Wangda Tan >Assignee: Sunil G > Attachments: 0002-YARN-4290.patch > > > Currently, "yarn nodes -list" command only shows > - "Node-Id", > - "Node-State", > - "Node-Http-Address", > - "Number-of-Running-Containers" > I think we need to show more information such as used resource, just like > "yarn nodes -status" command. > Maybe we can add a parameter to -list, such as "-show-details" to enable > printing all detailed information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4290) "yarn nodes -list" should print all nodes reports information
[ https://issues.apache.org/jira/browse/YARN-4290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15064413#comment-15064413 ] Wangda Tan commented on YARN-4290: -- [~sunilg], Patch looks good to me! > "yarn nodes -list" should print all nodes reports information > - > > Key: YARN-4290 > URL: https://issues.apache.org/jira/browse/YARN-4290 > Project: Hadoop YARN > Issue Type: Improvement > Components: client >Reporter: Wangda Tan >Assignee: Sunil G > Attachments: 0002-YARN-4290.patch > > > Currently, "yarn nodes -list" command only shows > - "Node-Id", > - "Node-State", > - "Node-Http-Address", > - "Number-of-Running-Containers" > I think we need to show more information such as used resource, just like > "yarn nodes -status" command. > Maybe we can add a parameter to -list, such as "-show-details" to enable > printing all detailed information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2934) Improve handling of container's stderr
[ https://issues.apache.org/jira/browse/YARN-2934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-2934: Attachment: YARN-2934.v2.003.patch [~jira.shegalov], Incorporating changes for ??Right now we are blindly grabbing file 0. It would however make much more sense to grab the most recent (highest mtime) non-empty file??, Please review & Earlier Test cases are not related to the modifications in this patch and its locally passing > Improve handling of container's stderr > --- > > Key: YARN-2934 > URL: https://issues.apache.org/jira/browse/YARN-2934 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Gera Shegalov >Assignee: Naganarasimha G R >Priority: Critical > Attachments: YARN-2934.v1.001.patch, YARN-2934.v1.002.patch, > YARN-2934.v1.003.patch, YARN-2934.v1.004.patch, YARN-2934.v1.005.patch, > YARN-2934.v1.006.patch, YARN-2934.v1.007.patch, YARN-2934.v1.008.patch, > YARN-2934.v2.001.patch, YARN-2934.v2.002.patch, YARN-2934.v2.003.patch > > > Most YARN applications redirect stderr to some file. That's why when > container launch fails with {{ExitCodeException}} the message is empty. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4234) New put APIs in TimelineClient for ats v1.5
[ https://issues.apache.org/jira/browse/YARN-4234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15064364#comment-15064364 ] Junping Du commented on YARN-4234: -- Thanks [~xgong] for updating the patch. The latest patch LGTM in overall. However, it seems to be many checkstyle issues reported by Jenkins are related to this patch. May be we should fix them before we get patch in? > New put APIs in TimelineClient for ats v1.5 > --- > > Key: YARN-4234 > URL: https://issues.apache.org/jira/browse/YARN-4234 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-4234-2015-11-13.1.patch, > YARN-4234-2015-11-16.1.patch, YARN-4234-2015-11-16.2.patch, > YARN-4234-2015.2.patch, YARN-4234.1.patch, YARN-4234.2.patch, > YARN-4234.2015-11-12.1.patch, YARN-4234.2015-11-12.1.patch, > YARN-4234.2015-11-18.1.patch, YARN-4234.2015-11-18.2.patch, > YARN-4234.2015-12-09.patch, YARN-4234.2015-12-09.patch, > YARN-4234.2015-12-17.1.patch, YARN-4234.20151109.patch, > YARN-4234.20151110.1.patch, YARN-4234.2015.1.patch, YARN-4234.3.patch > > > In this ticket, we will add new put APIs in timelineClient to let > clients/applications have the option to use ATS v1.5 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4389) "yarn.am.blacklisting.enabled" and "yarn.am.blacklisting.disable-failure-threshold" should be app specific rather than a setting for whole YARN cluster
[ https://issues.apache.org/jira/browse/YARN-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15064322#comment-15064322 ] Hadoop QA commented on YARN-4389: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 9s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 0s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 17s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 29s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 42s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 40s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 59s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 32s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 52s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 35s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 1s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 2m 1s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 1s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 16s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 2m 16s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 16s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 29s {color} | {color:red} Patch generated 5 new checkstyle issues in hadoop-yarn-project/hadoop-yarn (total was 161, now 166). {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 41s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 40s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 21s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 33s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 0s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 24s {color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 0s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 21s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 26s {color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 16s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.7.0_91. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 42s {color} | {color:
[jira] [Commented] (YARN-4003) ReservationQueue inherit getAMResourceLimit() from LeafQueue, but behavior is not consistent
[ https://issues.apache.org/jira/browse/YARN-4003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15064234#comment-15064234 ] Sunil G commented on YARN-4003: --- Hi [~curino] Thank you for the clarification. Yes, there is no cleaner solution. But I think, we cud calculate the AMUsed capacity of all ReservationQueue under one parent (PlanQueue) So could we have an api like below in {{PlanQueue}} and use along with {{getAMResourceLimit}} to ensure that we do not cross the limit of parent queue. I might be wrong, pls correct me if so. {code} public synchronized Resource sumOfChildAMUsedCapacities() { Resource ret = Resources.createResource(0); for (CSQueue l : childQueues) { Resources.addTo(ret, l.getQueueResourceUsage().getAMUsed()); } return ret; } {code} > ReservationQueue inherit getAMResourceLimit() from LeafQueue, but behavior is > not consistent > > > Key: YARN-4003 > URL: https://issues.apache.org/jira/browse/YARN-4003 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Carlo Curino >Assignee: Carlo Curino > Attachments: YARN-4003.patch > > > The inherited behavior from LeafQueue (limit AM % based on capacity) is not a > good fit for ReservationQueue (that have highly dynamic capacity). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4290) "yarn nodes -list" should print all nodes reports information
[ https://issues.apache.org/jira/browse/YARN-4290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-4290: -- Attachment: 0002-YARN-4290.patch Uploading a new patch as related tickets are committed Sample o/p {noformat} root@sunil-Inspiron-3543:/opt/hadoop/trunk/hadoop-3.0.0-SNAPSHOT/bin# ./yarn node -list -showDetails 15/12/18 21:24:21 INFO client.RMProxy: Connecting to ResourceManager at /127.0.0.1:25001 Total Nodes:1 Node-Id Node-State Node-Http-Address Number-of-Running-Containers localhost:25006RUNNING localhost:25008 0 Detailed Node Information : Configured Resources : Allocated Resources : Resource Utilization by Node : PMem:4884 MB, VMem:4884 MB, VCores:2.5824726 Resource Utilization by Containers : PMem:0 MB, VMem:0 MB, VCores:0.0 Node-Labels : {noformat} [~leftnoteasy] and [~Naganarasimha Garla], please help to share your thoughts. > "yarn nodes -list" should print all nodes reports information > - > > Key: YARN-4290 > URL: https://issues.apache.org/jira/browse/YARN-4290 > Project: Hadoop YARN > Issue Type: Improvement > Components: client >Reporter: Wangda Tan >Assignee: Sunil G > Attachments: 0002-YARN-4290.patch > > > Currently, "yarn nodes -list" command only shows > - "Node-Id", > - "Node-State", > - "Node-Http-Address", > - "Number-of-Running-Containers" > I think we need to show more information such as used resource, just like > "yarn nodes -status" command. > Maybe we can add a parameter to -list, such as "-show-details" to enable > printing all detailed information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4428) Redirect RM page to AHS page when AHS turned on and RM page is not avaialable
[ https://issues.apache.org/jira/browse/YARN-4428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chang Li updated YARN-4428: --- Attachment: YARN-4428.2.2.patch .2.2 addressed the whitespace issue > Redirect RM page to AHS page when AHS turned on and RM page is not avaialable > - > > Key: YARN-4428 > URL: https://issues.apache.org/jira/browse/YARN-4428 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Chang Li >Assignee: Chang Li > Attachments: YARN-4428.1.2.patch, YARN-4428.1.patch, > YARN-4428.2.2.patch, YARN-4428.2.patch > > > When AHS is turned on, if we can't view application in RM page, RM page > should redirect us to AHS page. For example, when you go to > cluster/app/application_1, if RM no longer remember the application, we will > simply get "Failed to read the application application_1", but it will be > good for RM ui to smartly try to redirect to AHS ui > /applicationhistory/app/application_1 to see if it's there. The redirect > usage already exist for logs in nodemanager UI. > Also, when AHS is enabled, WebAppProxyServlet should redirect to AHS page on > fall back of RM not remembering the app. YARN-3975 tried to do this only when > original tracking url is not set. But there are many cases, such as when app > failed at launch, original tracking url will be set to point to RM page, so > redirect to AHS page won't work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4350) TestDistributedShell fails for V2 scenarios
[ https://issues.apache.org/jira/browse/YARN-4350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15064170#comment-15064170 ] Naganarasimha G R commented on YARN-4350: - Hi [~varun_saxena], As discussed offline, this seems to be a problem with the Distributed shell AM. {{TestDistributedShell.checkTimelineV1}} checks whether only 2 (requested) containers are being launched. But in reality more than 2 are getting launched. possible reasons for it are : * when RM has assigned additional containers and the Distributed shell AM is launching it. I had observed similar behavior of over assigning in MR also but MR AM takes care returning the extra apps assigned by the RM. Similar approach should exist in Distributed shell AM too. * RM has killed for some reason and extra Container is reached Not sure which of these cases is causing the assigning of additional containers, to analyze this we require more RM and AM logs which test case logs are not providing and further its not related to the fixes of this issue. IMO its also possible to come in trunk too. So i think we can raise another jira to track this ! > TestDistributedShell fails for V2 scenarios > --- > > Key: YARN-4350 > URL: https://issues.apache.org/jira/browse/YARN-4350 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Naganarasimha G R > Attachments: YARN-4350-feature-YARN-2928.001.patch, > YARN-4350-feature-YARN-2928.002.patch, YARN-4350-feature-YARN-2928.003.patch > > > Currently TestDistributedShell does not pass on the feature-YARN-2928 branch. > There seem to be 2 distinct issues. > (1) testDSShellWithoutDomainV2* tests fail sporadically > These test fail more often than not if tested by themselves: > {noformat} > testDSShellWithoutDomainV2DefaultFlow(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) > Time elapsed: 30.998 sec <<< FAILURE! > java.lang.AssertionError: Application created event should be published > atleast once expected:<1> but was:<0> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.checkTimelineV2(TestDistributedShell.java:451) > at > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:326) > at > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithoutDomainV2DefaultFlow(TestDistributedShell.java:207) > {noformat} > They start happening after YARN-4129. I suspect this might have to do with > some timing issue. > (2) the whole test times out > If you run the whole TestDistributedShell test, it times out without fail. > This may or may not have to do with the port change introduced by YARN-2859 > (just a hunch). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4480) Clean up some inappropriate imports
[ https://issues.apache.org/jira/browse/YARN-4480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15064145#comment-15064145 ] Daniel Templeton commented on YARN-4480: Thanks, [~drankye]. Loogs good in general. Your indentation is off in the {{Strings}} line: {code} - !Strings.isEmpty(rr.getResourceName()) ? rr + !Strings.isNullOrEmpty(rr.getResourceName()) ? rr {code} The original line was correctly indented. > Clean up some inappropriate imports > --- > > Key: YARN-4480 > URL: https://issues.apache.org/jira/browse/YARN-4480 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Kai Zheng > Attachments: YARN-4480-v1.patch > > > It was noticed there are some unnecessary dependency into Directory classes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires
[ https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15064134#comment-15064134 ] MENG DING commented on YARN-4138: - Hi, [~sandflee] 1. Yes, this is the expected behavior. If you take a look at the discussion from the beginning of this thread, we have decided that if multiple increase tokens are granted by RM in a row for a container before AM uses any of the token, the last token will take effect, and any previous tokens will be effectively cancelled. If RM sees a difference between its own number and the number reported from NM, it will consider it as an *unconfirmed* state, and won't set the lastConfirmed value. Besides, if AM issues multiple increase requests, but doesn't use the last token, it is considered an user error. 2. If I understand your question correctly, then you are right that you should pass container B. In fact, the container B you are talking about is technically still container A, as uniquely identified by the container ID. When resource increase request of container A is granted by RM, RM still sends back container A, but with updated resource and token. As an Application Master developer, you are expected to track all live containers in AM, and in the onContainersResourceChanged(List changedContainers) callback function, you need to replace the original container A with the updated container A. > Roll back container resource allocation after resource increase token expires > - > > Key: YARN-4138 > URL: https://issues.apache.org/jira/browse/YARN-4138 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, nodemanager, resourcemanager >Reporter: MENG DING >Assignee: MENG DING > Attachments: YARN-4138-YARN-1197.1.patch, YARN-4138-YARN-1197.2.patch > > > In YARN-1651, after container resource increase token expires, the running > container is killed. > This ticket will change the behavior such that when a container resource > increase token expires, the resource allocation of the container will be > reverted back to the value before the increase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4467) Shell.checkIsBashSupported swallowed an interrupted exception
[ https://issues.apache.org/jira/browse/YARN-4467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated YARN-4467: -- Description: Edit: move this JIRA from HADOOP to YARN, as Shell.checkIsBashSupported() is used, and only used in YARN. Shell.checkIsBashSupported() creates a bash shell command to verify if the system supports bash. However, its error message is misleading, and the logic should be updated. If the shell command throws an IOException, it does not imply the bash did not run successfully. If the shell command process was interrupted, its internal logic throws an InterruptedIOException, which is a subclass of IOException. {code:title=Shell.checkIsBashSupported|borderStyle=solid} ShellCommandExecutor shexec; boolean supported = true; try { String[] args = {"bash", "-c", "echo 1000"}; shexec = new ShellCommandExecutor(args); shexec.execute(); } catch (IOException ioe) { LOG.warn("Bash is not supported by the OS", ioe); supported = false; } {code} An example of it appeared in a recent jenkins job https://builds.apache.org/job/PreCommit-HADOOP-Build/8257/testReport/org.apache.hadoop.ipc/TestRPCWaitForProxy/testInterruptedWaitForProxy/ The test logic in TestRPCWaitForProxy.testInterruptedWaitForProxy starts a thread, wait it for 1 second, and interrupt the thread, expecting the thread to terminate. However, the method Shell.checkIsBashSupported swallowed the interrupt, and therefore failed. {noformat} 2015-12-16 21:31:53,797 WARN util.Shell (Shell.java:checkIsBashSupported(718)) - Bash is not supported by the OS java.io.InterruptedIOException: java.lang.InterruptedException at org.apache.hadoop.util.Shell.runCommand(Shell.java:930) at org.apache.hadoop.util.Shell.run(Shell.java:838) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1117) at org.apache.hadoop.util.Shell.checkIsBashSupported(Shell.java:716) at org.apache.hadoop.util.Shell.(Shell.java:705) at org.apache.hadoop.util.StringUtils.(StringUtils.java:79) at org.apache.hadoop.security.SecurityUtil.getAuthenticationMethod(SecurityUtil.java:639) at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:273) at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:261) at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:803) at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:773) at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:646) at org.apache.hadoop.ipc.RPC.waitForProtocolProxy(RPC.java:397) at org.apache.hadoop.ipc.RPC.waitForProtocolProxy(RPC.java:350) at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:330) at org.apache.hadoop.ipc.TestRPCWaitForProxy$RpcThread.run(TestRPCWaitForProxy.java:115) Caused by: java.lang.InterruptedException at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:503) at java.lang.UNIXProcess.waitFor(UNIXProcess.java:264) at org.apache.hadoop.util.Shell.runCommand(Shell.java:920) ... 15 more {noformat} The original design is not desirable, as it swallowed a potential interrupt, causing TestRPCWaitForProxy.testInterruptedWaitForProxy to fail. Unfortunately, Java does not allow this static method to throw exception. We should removed the static member variable, so that the method can throw the interrupt exception. The node manager should call the static method, instead of using the static member variable. This fix has an associated benefit: the tests could run faster, because it will no longer need to spawn a bash process when it uses a Shell static method variable (which happens quite often for checking what operating system Hadoop is running on) was: Shell.checkIsBashSupported() creates a bash shell command to verify if the system supports bash. However, its error message is misleading, and the logic should be updated. If the shell command throws an IOException, it does not imply the bash did not run successfully. If the shell command process was interrupted, its internal logic throws an InterruptedIOException, which is a subclass of IOException. {code:title=Shell.checkIsBashSupported|borderStyle=solid} ShellCommandExecutor shexec; boolean supported = true; try { String[] args = {"bash", "-c", "echo 1000"}; shexec = new ShellCommandExecutor(args); shexec.execute(); } catch (IOException ioe) { LOG.warn("Bash is not supported by the OS", ioe); supported = false; } {code} An example of it appeared in a recent jenkins job https://builds.apache.org/job/PreCommit-HADOOP-Build/8257/testReport/org.apach
[jira] [Updated] (YARN-4467) Shell.checkIsBashSupported swallowed an interrupted exception
[ https://issues.apache.org/jira/browse/YARN-4467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated YARN-4467: -- Description: Shell.checkIsBashSupported() creates a bash shell command to verify if the system supports bash. However, its error message is misleading, and the logic should be updated. If the shell command throws an IOException, it does not imply the bash did not run successfully. If the shell command process was interrupted, its internal logic throws an InterruptedIOException, which is a subclass of IOException. {code:title=Shell.checkIsBashSupported|borderStyle=solid} ShellCommandExecutor shexec; boolean supported = true; try { String[] args = {"bash", "-c", "echo 1000"}; shexec = new ShellCommandExecutor(args); shexec.execute(); } catch (IOException ioe) { LOG.warn("Bash is not supported by the OS", ioe); supported = false; } {code} An example of it appeared in a recent jenkins job https://builds.apache.org/job/PreCommit-HADOOP-Build/8257/testReport/org.apache.hadoop.ipc/TestRPCWaitForProxy/testInterruptedWaitForProxy/ The test logic in TestRPCWaitForProxy.testInterruptedWaitForProxy starts a thread, wait it for 1 second, and interrupt the thread, expecting the thread to terminate. However, the method Shell.checkIsBashSupported swallowed the interrupt, and therefore failed. {noformat} 2015-12-16 21:31:53,797 WARN util.Shell (Shell.java:checkIsBashSupported(718)) - Bash is not supported by the OS java.io.InterruptedIOException: java.lang.InterruptedException at org.apache.hadoop.util.Shell.runCommand(Shell.java:930) at org.apache.hadoop.util.Shell.run(Shell.java:838) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1117) at org.apache.hadoop.util.Shell.checkIsBashSupported(Shell.java:716) at org.apache.hadoop.util.Shell.(Shell.java:705) at org.apache.hadoop.util.StringUtils.(StringUtils.java:79) at org.apache.hadoop.security.SecurityUtil.getAuthenticationMethod(SecurityUtil.java:639) at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:273) at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:261) at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:803) at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:773) at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:646) at org.apache.hadoop.ipc.RPC.waitForProtocolProxy(RPC.java:397) at org.apache.hadoop.ipc.RPC.waitForProtocolProxy(RPC.java:350) at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:330) at org.apache.hadoop.ipc.TestRPCWaitForProxy$RpcThread.run(TestRPCWaitForProxy.java:115) Caused by: java.lang.InterruptedException at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:503) at java.lang.UNIXProcess.waitFor(UNIXProcess.java:264) at org.apache.hadoop.util.Shell.runCommand(Shell.java:920) ... 15 more {noformat} The original design is not desirable, as it swallowed a potential interrupt, causing TestRPCWaitForProxy.testInterruptedWaitForProxy to fail. Unfortunately, Java does not allow this static method to throw exception. We should removed the static member variable, so that the method can throw the interrupt exception. The node manager should call the static method, instead of using the static member variable. This fix has an associated benefit: the tests could run faster, because it will no longer need to spawn a bash process when it uses a Shell static method variable (which happens quite often for checking what operating system Hadoop is running on) was: Shell.checkIsBashSupported() creates a bash shell command to verify if the system supports bash. However, its error message is misleading, and the logic should be updated. If the shell command throws an IOException, it does not imply the bash did not run successfully. If the shell command process was interrupted, its internal logic throws an InterruptedIOException, which is a subclass of IOException. {code:title=Shell.checkIsBashSupported|borderStyle=solid} ShellCommandExecutor shexec; boolean supported = true; try { String[] args = {"bash", "-c", "echo 1000"}; shexec = new ShellCommandExecutor(args); shexec.execute(); } catch (IOException ioe) { LOG.warn("Bash is not supported by the OS", ioe); supported = false; } {code} An example of it appeared in a recent jenkins job https://builds.apache.org/job/PreCommit-HADOOP-Build/8257/testReport/org.apache.hadoop.ipc/TestRPCWaitForProxy/testInterruptedWaitForProxy/ The test logic in TestRPCWaitForProxy.testInt
[jira] [Updated] (YARN-4467) Shell.checkIsBashSupported swallowed an interrupted exception
[ https://issues.apache.org/jira/browse/YARN-4467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated YARN-4467: -- Description: Shell.checkIsBashSupported() creates a bash shell command to verify if the system supports bash. However, its error message is misleading, and the logic should be updated. If the shell command throws an IOException, it does not imply the bash did not run successfully. If the shell command process was interrupted, its internal logic throws an InterruptedIOException, which is a subclass of IOException. {code:title=Shell.checkIsBashSupported|borderStyle=solid} ShellCommandExecutor shexec; boolean supported = true; try { String[] args = {"bash", "-c", "echo 1000"}; shexec = new ShellCommandExecutor(args); shexec.execute(); } catch (IOException ioe) { LOG.warn("Bash is not supported by the OS", ioe); supported = false; } {code} An example of it appeared in a recent jenkins job https://builds.apache.org/job/PreCommit-HADOOP-Build/8257/testReport/org.apache.hadoop.ipc/TestRPCWaitForProxy/testInterruptedWaitForProxy/ The test logic in TestRPCWaitForProxy.testInterruptedWaitForProxy starts a thread, wait it for 1 second, and interrupt the thread, expecting the thread to terminate. However, the method Shell.checkIsBashSupported swallowed the interrupt, and therefore failed. {noformat} 2015-12-16 21:31:53,797 WARN util.Shell (Shell.java:checkIsBashSupported(718)) - Bash is not supported by the OS java.io.InterruptedIOException: java.lang.InterruptedException at org.apache.hadoop.util.Shell.runCommand(Shell.java:930) at org.apache.hadoop.util.Shell.run(Shell.java:838) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1117) at org.apache.hadoop.util.Shell.checkIsBashSupported(Shell.java:716) at org.apache.hadoop.util.Shell.(Shell.java:705) at org.apache.hadoop.util.StringUtils.(StringUtils.java:79) at org.apache.hadoop.security.SecurityUtil.getAuthenticationMethod(SecurityUtil.java:639) at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:273) at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:261) at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:803) at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:773) at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:646) at org.apache.hadoop.ipc.RPC.waitForProtocolProxy(RPC.java:397) at org.apache.hadoop.ipc.RPC.waitForProtocolProxy(RPC.java:350) at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:330) at org.apache.hadoop.ipc.TestRPCWaitForProxy$RpcThread.run(TestRPCWaitForProxy.java:115) Caused by: java.lang.InterruptedException at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:503) at java.lang.UNIXProcess.waitFor(UNIXProcess.java:264) at org.apache.hadoop.util.Shell.runCommand(Shell.java:920) ... 15 more {noformat} The original design is not desirable, as it swallowed a potential interrupt, causing TestRPCWaitForProxy.testInterruptedWaitForProxy to fail. Unfortunately, Java does not allow this static method to throw exception. We should removed the static member variable, so that the method can throw the interrupt exception. was: Shell.checkIsBashSupported() creates a bash shell command to verify if the system supports bash. However, its error message is misleading, and the logic should be updated. If the shell command throws an IOException, it does not imply the bash did not run successfully. If the shell command process was interrupted, its internal logic throws an InterruptedIOException, which is a subclass of IOException. {code:title=Shell.checkIsBashSupported|borderStyle=solid} ShellCommandExecutor shexec; boolean supported = true; try { String[] args = {"bash", "-c", "echo 1000"}; shexec = new ShellCommandExecutor(args); shexec.execute(); } catch (IOException ioe) { LOG.warn("Bash is not supported by the OS", ioe); supported = false; } {code} An example of it appeared in a recent jenkins job https://builds.apache.org/job/PreCommit-HADOOP-Build/8257/testReport/org.apache.hadoop.ipc/TestRPCWaitForProxy/testInterruptedWaitForProxy/ The test logic in TestRPCWaitForProxy.testInterruptedWaitForProxy starts a thread, wait it for 1 second, and interrupt the thread, expecting the thread to terminate. However, the method Shell.checkIsBashSupported swallowed the interrupt, and therefore failed. {noformat} 2015-12-16 21:31:53,797 WARN util.Shell (Shell.java:checkIsBashSupported(718)) - Bash is not supported by the OS j
[jira] [Updated] (YARN-4467) Shell.checkIsBashSupported swallowed an interrupted exception
[ https://issues.apache.org/jira/browse/YARN-4467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated YARN-4467: -- Description: Shell.checkIsBashSupported() creates a bash shell command to verify if the system supports bash. However, its error message is misleading, and the logic should be updated. If the shell command throws an IOException, it does not imply the bash did not run successfully. If the shell command process was interrupted, its internal logic throws an InterruptedIOException, which is a subclass of IOException. {code:title=Shell.checkIsBashSupported|borderStyle=solid} ShellCommandExecutor shexec; boolean supported = true; try { String[] args = {"bash", "-c", "echo 1000"}; shexec = new ShellCommandExecutor(args); shexec.execute(); } catch (IOException ioe) { LOG.warn("Bash is not supported by the OS", ioe); supported = false; } {code} An example of it appeared in a recent jenkins job https://builds.apache.org/job/PreCommit-HADOOP-Build/8257/testReport/org.apache.hadoop.ipc/TestRPCWaitForProxy/testInterruptedWaitForProxy/ The test logic in TestRPCWaitForProxy.testInterruptedWaitForProxy starts a thread, wait it for 1 second, and interrupt the thread, expecting the thread to terminate. However, the method Shell.checkIsBashSupported swallowed the interrupt, and therefore failed. {noformat} 2015-12-16 21:31:53,797 WARN util.Shell (Shell.java:checkIsBashSupported(718)) - Bash is not supported by the OS java.io.InterruptedIOException: java.lang.InterruptedException at org.apache.hadoop.util.Shell.runCommand(Shell.java:930) at org.apache.hadoop.util.Shell.run(Shell.java:838) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1117) at org.apache.hadoop.util.Shell.checkIsBashSupported(Shell.java:716) at org.apache.hadoop.util.Shell.(Shell.java:705) at org.apache.hadoop.util.StringUtils.(StringUtils.java:79) at org.apache.hadoop.security.SecurityUtil.getAuthenticationMethod(SecurityUtil.java:639) at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:273) at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:261) at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:803) at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:773) at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:646) at org.apache.hadoop.ipc.RPC.waitForProtocolProxy(RPC.java:397) at org.apache.hadoop.ipc.RPC.waitForProtocolProxy(RPC.java:350) at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:330) at org.apache.hadoop.ipc.TestRPCWaitForProxy$RpcThread.run(TestRPCWaitForProxy.java:115) Caused by: java.lang.InterruptedException at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:503) at java.lang.UNIXProcess.waitFor(UNIXProcess.java:264) at org.apache.hadoop.util.Shell.runCommand(Shell.java:920) ... 15 more {noformat} The original design is not desirable, as it swallowed a potential interrupt, causing TestRPCWaitForProxy.testInterruptedWaitForProxy to fail. Unfortunately, Java does not allow this static method to throw exception. We should removed the static member variable, so that the method can throw the interrupt exception. The node manager should call the static method, instead of using the static member variable. was: Shell.checkIsBashSupported() creates a bash shell command to verify if the system supports bash. However, its error message is misleading, and the logic should be updated. If the shell command throws an IOException, it does not imply the bash did not run successfully. If the shell command process was interrupted, its internal logic throws an InterruptedIOException, which is a subclass of IOException. {code:title=Shell.checkIsBashSupported|borderStyle=solid} ShellCommandExecutor shexec; boolean supported = true; try { String[] args = {"bash", "-c", "echo 1000"}; shexec = new ShellCommandExecutor(args); shexec.execute(); } catch (IOException ioe) { LOG.warn("Bash is not supported by the OS", ioe); supported = false; } {code} An example of it appeared in a recent jenkins job https://builds.apache.org/job/PreCommit-HADOOP-Build/8257/testReport/org.apache.hadoop.ipc/TestRPCWaitForProxy/testInterruptedWaitForProxy/ The test logic in TestRPCWaitForProxy.testInterruptedWaitForProxy starts a thread, wait it for 1 second, and interrupt the thread, expecting the thread to terminate. However, the method Shell.checkIsBashSupported swallowed the interrupt, and therefore failed. {noformat} 2015-12-16 21:31:53,79
[jira] [Commented] (YARN-914) (Umbrella) Support graceful decommission of nodemanager
[ https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15064054#comment-15064054 ] Jason Lowe commented on YARN-914: - [~danzhi] the patch should be against trunk. We always commit first against trunk and then backport to prior releases in reverse release order (e.g.: trunk->branch-2->branch-2.8->branch-2.7) so we avoid a situation where a feature or fix is in a release but disappears in a subsequently released version. See the [How to Contribute|http://wiki.apache.org/hadoop/HowToContribute] page for more information including details on preparing and naming the patch, etc. Is this implementation inline with the design document on this JIRA or is it using a different approach? > (Umbrella) Support graceful decommission of nodemanager > --- > > Key: YARN-914 > URL: https://issues.apache.org/jira/browse/YARN-914 > Project: Hadoop YARN > Issue Type: Improvement > Components: graceful >Affects Versions: 2.0.4-alpha >Reporter: Luke Lu >Assignee: Junping Du > Attachments: Gracefully Decommission of NodeManager (v1).pdf, > Gracefully Decommission of NodeManager (v2).pdf, > GracefullyDecommissionofNodeManagerv3.pdf > > > When NMs are decommissioned for non-fault reasons (capacity change etc.), > it's desirable to minimize the impact to running applications. > Currently if a NM is decommissioned, all running containers on the NM need to > be rescheduled on other NMs. Further more, for finished map tasks, if their > map output are not fetched by the reducers of the job, these map tasks will > need to be rerun as well. > We propose to introduce a mechanism to optionally gracefully decommission a > node manager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4454) NM to nodelabel mapping going wrong after RM restart
[ https://issues.apache.org/jira/browse/YARN-4454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-4454: --- Attachment: test.patch Attaching testcode to reproduce the same > NM to nodelabel mapping going wrong after RM restart > > > Key: YARN-4454 > URL: https://issues.apache.org/jira/browse/YARN-4454 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Critical > Attachments: test.patch > > > *Nodelabel mapping with NodeManager is going wrong if combination of > hostname and then NodeId is used to update nodelabel mapping* > *Steps to reproduce* > 1.Create cluster with 2 NM > 2.Add label X,Y to cluster > 3.replace Label of node 1 using ,x > 4.replace label for node 1 by ,y > 5.Again replace label of node 1 by ,x > Check cluster label mapping HOSTNAME1 will be mapped with X > Now restart RM 2 times NODE LABEL mapping of HOSTNAME1:PORT changes to Y > {noformat} > 2015-12-14 17:17:54,901 INFO > org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: Add labels: > [,] > 2015-12-14 17:17:54,905 INFO > org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: REPLACE labels on > nodes: > 2015-12-14 17:17:54,906 INFO > org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: > NM=host-10-19-92-188:64318, labels=[ResourcePool_1] > 2015-12-14 17:17:54,906 INFO > org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: > NM=host-10-19-92-188:0, labels=[ResourcePool_null] > 2015-12-14 17:17:54,906 INFO > org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: > NM=host-10-19-92-187:64318, labels=[ResourcePool_null] > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4389) "yarn.am.blacklisting.enabled" and "yarn.am.blacklisting.disable-failure-threshold" should be app specific rather than a setting for whole YARN cluster
[ https://issues.apache.org/jira/browse/YARN-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-4389: -- Attachment: 0004-YARN-4389.patch Patch has gone stale. Rebasing the same. [~djp], could you please help to check the same. > "yarn.am.blacklisting.enabled" and > "yarn.am.blacklisting.disable-failure-threshold" should be app specific > rather than a setting for whole YARN cluster > --- > > Key: YARN-4389 > URL: https://issues.apache.org/jira/browse/YARN-4389 > Project: Hadoop YARN > Issue Type: Bug > Components: applications >Reporter: Junping Du >Assignee: Sunil G >Priority: Critical > Attachments: 0001-YARN-4389.patch, 0002-YARN-4389.patch, > 0003-YARN-4389.patch, 0004-YARN-4389.patch > > > "yarn.am.blacklisting.enabled" and > "yarn.am.blacklisting.disable-failure-threshold" should be application > specific rather than a setting in cluster level, or we should't maintain > amBlacklistingEnabled and blacklistDisableThreshold in per rmApp level. We > should allow each am to override this config, i.e. via submissionContext. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4304) AM max resource configuration per partition to be displayed/updated correctly in UI and in various partition related metrics
[ https://issues.apache.org/jira/browse/YARN-4304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15064004#comment-15064004 ] Sunil G commented on YARN-4304: --- Test case failures are not related. Except from known test fails, others passed locally. [~leftnoteasy] could you please help to take a look on latest patch. As mentioned above. {noformat} ("Max Application Master Resources Per User:", resourceUsages.getAMResourceLimit().toString()); {noformat} I am giving AM Resource limit for user AM limit, This is not really correct. But to get this, we need to have some round about way. I gave this implementation based on below comment. Is this what you also expect? bq. I suggest to remove it and we can use amResourceLimit of first user of queues to show on UI. > AM max resource configuration per partition to be displayed/updated correctly > in UI and in various partition related metrics > > > Key: YARN-4304 > URL: https://issues.apache.org/jira/browse/YARN-4304 > Project: Hadoop YARN > Issue Type: Sub-task > Components: webapp >Affects Versions: 2.7.1 >Reporter: Sunil G >Assignee: Sunil G > Attachments: 0001-YARN-4304.patch, 0002-YARN-4304.patch, > 0003-YARN-4304.patch, 0004-YARN-4304.patch, 0005-YARN-4304.patch, > 0005-YARN-4304.patch, REST_and_UI.zip > > > As we are supporting per-partition level max AM resource percentage > configuration, UI and various metrics also need to display correct > configurations related to same. > For eg: Current UI still shows am-resource percentage per queue level. This > is to be updated correctly when label config is used. > - Display max-am-percentage per-partition in Scheduler UI (label also) and in > ClusterMetrics page > - Update queue/partition related metrics w.r.t per-partition > am-resource-percentage -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4477) FairScheduler: encounter infinite loop in attemptScheduling
[ https://issues.apache.org/jira/browse/YARN-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15063959#comment-15063959 ] Hadoop QA commented on YARN-4477: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 10m 51s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 49s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 17s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 53s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 21s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 43s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 39s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 41s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 48s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 48s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 48s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 43s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 16s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 52s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 22s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 50s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 39s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 41s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 56s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 38s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_91. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 36s {color} | {color:red} Patch generated 1 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 159m 46s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_66 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerNodeLabelUpdate | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | JDK v1.7.0_91 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA Patch URL | https://issues.apache.org/jira/secure/a