[jira] [Commented] (YARN-4344) NMs reconnecting with changed capabilities can lead to wrong cluster resource calculations
[ https://issues.apache.org/jira/browse/YARN-4344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15001800#comment-15001800 ] zhihai xu commented on YARN-4344: - Thanks for reporting this issue [~vvasudev]! Thanks for the review [~Jason Lowe]! [~rohithsharma] tried to clean up the code at YARN-3286. Based on the following comment from [~jianhe] at YARN-3286, {code} I think this has changed the behavior that without any RM/NM restart features enabled, earlier restarting a node will trigger RM to kill all the containers on this node, but now it won't ? {code} The patch may cause compatibility issue. Maybe we can merge the case {{rmNode.getHttpPort() == newNode.getHttpPort()}} with {{rmNode.getHttpPort() != newNode.getHttpPort()}} for noRunningApps. Thoughts? > NMs reconnecting with changed capabilities can lead to wrong cluster resource > calculations > -- > > Key: YARN-4344 > URL: https://issues.apache.org/jira/browse/YARN-4344 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.1, 2.6.2 >Reporter: Varun Vasudev >Assignee: Varun Vasudev >Priority: Critical > Attachments: YARN-4344.001.patch > > > After YARN-3802, if an NM re-connects to the RM with changed capabilities, > there can arise situations where the overall cluster resource calculation for > the cluster will be incorrect leading to inconsistencies in scheduling. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4051) ContainerKillEvent is lost when container is In New State and is recovering
[ https://issues.apache.org/jira/browse/YARN-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15001783#comment-15001783 ] Hadoop QA commented on YARN-4051: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 6s {color} | {color:blue} docker + precommit patch detected. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 11s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 54s {color} | {color:green} trunk passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 51s {color} | {color:green} trunk passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 28s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 21s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 39s {color} | {color:green} trunk passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 19s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common in trunk has 3 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 29s {color} | {color:green} trunk passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 59s {color} | {color:green} trunk passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 17s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 53s {color} | {color:green} the patch passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 53s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 50s {color} | {color:green} the patch passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 50s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 27s {color} | {color:red} Patch generated 1 new checkstyle issues in hadoop-yarn-project/hadoop-yarn (total was 265, now 265). {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 22s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 39s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 0s {color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 10s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 40s {color} | {color:green} the patch passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 11s {color} | {color:green} the patch passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 24s {color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_60. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 4s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.8.0_60. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 0s {color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with JDK v1.8.0_60. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 24s {color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.7.0_79. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 8s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK
[jira] [Updated] (YARN-4051) ContainerKillEvent is lost when container is In New State and is recovering
[ https://issues.apache.org/jira/browse/YARN-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sandflee updated YARN-4051: --- Attachment: YARN-4051.05.patch set default timeout to 2min, since default nm expire timeout is 10min > ContainerKillEvent is lost when container is In New State and is recovering > > > Key: YARN-4051 > URL: https://issues.apache.org/jira/browse/YARN-4051 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: sandflee >Assignee: sandflee >Priority: Critical > Attachments: YARN-4051.01.patch, YARN-4051.02.patch, > YARN-4051.03.patch, YARN-4051.04.patch, YARN-4051.05.patch > > > As in YARN-4050, NM event dispatcher is blocked, and container is in New > state, when we finish application, the container still alive even after NM > event dispatcher is unblocked. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4050) NM event dispatcher may blocked by LogAggregationService if NameNode is slow
[ https://issues.apache.org/jira/browse/YARN-4050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15001734#comment-15001734 ] sandflee commented on YARN-4050: There may be 2 problems: 1, nm dispatcher maybe blocked by logaggregation service, should we move logaggregation event to a new event dispatcher? 2, nm recovery is blocked, and there're some bad effect as in YARN-4051 > NM event dispatcher may blocked by LogAggregationService if NameNode is slow > > > Key: YARN-4050 > URL: https://issues.apache.org/jira/browse/YARN-4050 > Project: Hadoop YARN > Issue Type: Bug >Reporter: sandflee > > env: nm restart and log aggregation is enabled. > NN is almost dead, when we restart NM, NM event dispatcher is blocked until > NN returns to normal.It seems. NM recovered app and send APPLICATION_START > event to log aggregation service, it will check log dir permission in > HDFS(BLOCKED) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4051) ContainerKillEvent is lost when container is In New State and is recovering
[ https://issues.apache.org/jira/browse/YARN-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15001725#comment-15001725 ] sandflee commented on YARN-4051: thanks [~jlowe] Should the value be infinite by default? The concern is that if one container has issues recovering (due to log aggregation woes or whatever) then we risk expiring all of the containers on this node if we don't re-register with the RM within the node expiry interval. I think it makes sense if we have also fixed the recovery paths so there aren't potentially long-running procedures (like contacting HDFS) during the recovery process. If we haven't then we could create as many problems as we're solving by waiting forever. -- aggree ! I also concern this. Why does the patch change the check interval? If it's to reduce the logging then we can better fix that by only logging when the status changes rather than every iteration. ---yes, it's to reduce the log, since recovery is almost very fast, change it back Nit: A value of zero should also be treated as a disabled max time. -- zero is to register to register to rm at once whether nm complete recover or not,yes? Nit: "Max time to wait NM to complete container recover before register to RM " should be "Max time NM will wait to complete container recovery before registering with the RM". -- corrected > ContainerKillEvent is lost when container is In New State and is recovering > > > Key: YARN-4051 > URL: https://issues.apache.org/jira/browse/YARN-4051 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: sandflee >Assignee: sandflee >Priority: Critical > Attachments: YARN-4051.01.patch, YARN-4051.02.patch, > YARN-4051.03.patch, YARN-4051.04.patch > > > As in YARN-4050, NM event dispatcher is blocked, and container is in New > state, when we finish application, the container still alive even after NM > event dispatcher is unblocked. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3769) Preemption occurring unnecessarily because preemption doesn't consider user limit
[ https://issues.apache.org/jira/browse/YARN-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15001722#comment-15001722 ] Hadoop QA commented on YARN-3769: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 9s {color} | {color:blue} docker + precommit patch detected. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 3 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 14s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s {color} | {color:green} trunk passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s {color} | {color:green} trunk passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 16s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 17s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s {color} | {color:green} trunk passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s {color} | {color:green} trunk passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 30s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s {color} | {color:green} the patch passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s {color} | {color:green} the patch passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 12s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 23s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s {color} | {color:green} the patch passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 32s {color} | {color:green} the patch passed with JDK v1.7.0_79 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 65m 47s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_60. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 35s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_79. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 26s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 142m 58s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_60 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | | hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart | | | hadoop.yarn.server.resourcemanager.TestRM | | JDK v1.7.0_79 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | \\ \\ || Subsystem || Report/Notes || | Docker | Client=1.7.1 Server=1.7.1 Image:test-patch-base-hadoop-date2015-11-12 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12771905/YARN-3769.005.patch | | JIRA Issue | YARN-3769 | | Optional Tests | asflicense javac javadoc mvninstall unit findbugs checkstyle compile | | uname | Linux dbfe7410cae9 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed S
[jira] [Commented] (YARN-2480) DockerContainerExecutor must support user namespaces
[ https://issues.apache.org/jira/browse/YARN-2480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15001707#comment-15001707 ] Abin Shahab commented on YARN-2480: --- Thanks, yes, it's interesting, but it demands contiguous id space for all tasks/docker containers. We are wondering how to distribute the ids among the tasks(do all tasks get the same range? Do all of them get separate ranges? Do all tasks belonging to the same job get the same range?) On Wed, Nov 11, 2015 at 6:52 PM, Erik Weathers (JIRA) > DockerContainerExecutor must support user namespaces > > > Key: YARN-2480 > URL: https://issues.apache.org/jira/browse/YARN-2480 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Abin Shahab > Labels: security > > When DockerContainerExector launches a container, the root inside that > container has root privileges on the host. > This is insecure in a mult-tenant environment. The uid of the container's > root user must be mapped to a non-privileged user on the host. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4349) Support CallerContext in YARN
[ https://issues.apache.org/jira/browse/YARN-4349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15001695#comment-15001695 ] Hadoop QA commented on YARN-4349: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 6s {color} | {color:blue} docker + precommit patch detected. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 10 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 51s {color} | {color:green} trunk passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 33s {color} | {color:green} trunk passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 3s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 59s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 33s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 2s {color} | {color:green} trunk passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 18s {color} | {color:green} trunk passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 49s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 44s {color} | {color:green} the patch passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 4m 44s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 4m 44s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 33s {color} | {color:green} the patch passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 4m 33s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 4m 33s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 2s {color} | {color:red} Patch generated 10 new checkstyle issues in root (total was 489, now 492). {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 59s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s {color} | {color:red} The patch has 24 line(s) that end in whitespace. Use git apply --whitespace=fix. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 28s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager introduced 1 new FindBugs issues. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 3s {color} | {color:green} the patch passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 22s {color} | {color:green} the patch passed with JDK v1.7.0_79 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 7m 0s {color} | {color:red} hadoop-common in the patch failed with JDK v1.8.0_60. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 25s {color} | {color:green} hadoop-yarn-server-common in the patch passed with JDK v1.8.0_60. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 31s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_60. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 56s {color} | {color:green} hadoop-mapreduce-client-app in the patch passed with JDK v1.8.0_60. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 7m 48s {color} | {color:green} hadoop-common in the patch passed with JDK v1.7.0_79. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 27s {color} | {color:green} hadoop-yarn-server-common in the patch passed with JDK v1.7.0_79. {color} | | {color:red}-1{color} | {color:red} uni
[jira] [Commented] (YARN-4347) Resource manager fails with Null pointer exception
[ https://issues.apache.org/jira/browse/YARN-4347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15001666#comment-15001666 ] Hadoop QA commented on YARN-4347: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 7s {color} | {color:blue} docker + precommit patch detected. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 52s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 36s {color} | {color:green} trunk passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s {color} | {color:green} trunk passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 16s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 19s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 36s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 40s {color} | {color:green} trunk passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 35s {color} | {color:green} trunk passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 34s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s {color} | {color:green} the patch passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 31s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s {color} | {color:green} the patch passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 28s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 15s {color} | {color:red} Patch generated 3 new checkstyle issues in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager (total was 237, now 239). {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 19s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 40s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 32s {color} | {color:green} the patch passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 33s {color} | {color:green} the patch passed with JDK v1.7.0_79 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 6s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_60. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 17s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_79. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 28s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 145m 32s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_60 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | JDK v1.7.0_79 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | \\ \\ || Subsystem || Report/Notes || | Docker | Client=1.7.1 Server=1.7.1 Image:test-patch-base-hadoop-date2015-11-12 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12771892/YARN-4347.2.patch | | JIRA Issue | YARN-4347 | | Optional Tests | asflicense javac javadoc mvninstall unit findbugs checkstyle compile | | uname | Linux 0ae13b980ea7 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMP
[jira] [Commented] (YARN-4327) RM can not renew TIMELINE_DELEGATION_TOKEN in secure clusters
[ https://issues.apache.org/jira/browse/YARN-4327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15001654#comment-15001654 ] Cheng-Hsin Cho commented on YARN-4327: -- Did you try using yarn.timeline-service.http-authentication.type=kerberos? http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_yarn_resource_mgt/content/ref-c2f35f55-fa15-4154-b80a-36df2db297d5.1.html > RM can not renew TIMELINE_DELEGATION_TOKEN in secure clusters > -- > > Key: YARN-4327 > URL: https://issues.apache.org/jira/browse/YARN-4327 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, timelineserver >Affects Versions: 2.7.1 > Environment: hadoop 2.7.1hdfs,yarn, mrhistoryserver, ATS all use > kerberos security. > conf like this: > > hadoop.security.authorization > true > Is service-level authorization enabled? > > > hadoop.security.authentication > kerberos > Possible values are simple (no authentication), and kerberos > > >Reporter: zhangshilong > > bin hadoop 2.7.1 > ATS conf like this: > > yarn.timeline-service.http-authentication.type > simple > > > yarn.timeline-service.http-authentication.kerberos.principal > HTTP/_h...@xxx.com > > > yarn.timeline-service.http-authentication.kerberos.keytab > /etc/hadoop/keytabs/xxx.keytab > > > yarn.timeline-service.principal > xxx/_h...@xxx.com > > > yarn.timeline-service.keytab > /etc/hadoop/keytabs/xxx.keytab > > > yarn.timeline-service.best-effort > true > > > yarn.timeline-service.enabled > true > > > I'd like to allow everyone to access ATS from HTTP as RM,HDFS. > client can submit job to RM and add TIMELINE_DELEGATION_TOKEN to AM > Context, but RM can not renew TIMELINE_DELEGATION_TOKEN and make application > to failure. > RM logs: > 2015-11-03 11:58:38,191 WARN > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer: > Unable to add the application to the delegation token renewer. > java.io.IOException: Failed to renew token: Kind: TIMELINE_DELEGATION_TOKEN, > Service: 10.12.38.4:8188, Ident: (owner=yarn-test, renewer=yarn-test, > realUser=, issueDate=1446523118046, maxDate=1447127918046, sequenceNumber=9, > masterKeyId=2) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:439) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$700(DelegationTokenRenewer.java:78) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:847) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:828) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: HTTP status [500], message [Null user] > at > org.apache.hadoop.util.HttpExceptionUtils.validateResponse(HttpExceptionUtils.java:169) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.doDelegationTokenOperation(DelegationTokenAuthenticator.java:287) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.renewDelegationToken(DelegationTokenAuthenticator.java:212) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticatedURL.renewDelegationToken(DelegationTokenAuthenticatedURL.java:414) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$3.run(TimelineClientImpl.java:396) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$3.run(TimelineClientImpl.java:378) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$5.run(TimelineClientImpl.java:451) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:183) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.operateDelegationToken(TimelineClientImpl.java:466) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.renewDelegationToken(TimelineClientImpl.java:400) > at > org.apache.hadoop.yarn.security.client.TimelineDelegat
[jira] [Commented] (YARN-4184) Remove update reservation state api from state store as its not used by ReservationSystem
[ https://issues.apache.org/jira/browse/YARN-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15001625#comment-15001625 ] Hadoop QA commented on YARN-4184: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 6s {color} | {color:blue} docker + precommit patch detected. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 5s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 21s {color} | {color:green} trunk passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 22s {color} | {color:green} trunk passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 12s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 7s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s {color} | {color:green} trunk passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s {color} | {color:green} trunk passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 28s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 21s {color} | {color:green} the patch passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 21s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s {color} | {color:green} the patch passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 23s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 11s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 16s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s {color} | {color:green} the patch passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s {color} | {color:green} the patch passed with JDK v1.7.0_79 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 58m 42s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_60. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 59m 3s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_79. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 22s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 129m 1s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_60 Failed junit tests | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | JDK v1.7.0_79 Failed junit tests | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | | hadoop.yarn.server.resourcemanager.TestClientRMTokens | \\ \\ || Subsystem || Report/Notes || | Docker | Client=1.7.1 Server=1.7.1 Image:test-patch-base-hadoop-date2015-11-12 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12771140/YARN-4184.v1.patch | | JIRA Issue | YARN-4184 | | Optional Tests | asflicense javac javadoc mvninstall unit findbugs checkstyle compile | | uname | Linux f52476d30bd3 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/wo
[jira] [Updated] (YARN-3769) Preemption occurring unnecessarily because preemption doesn't consider user limit
[ https://issues.apache.org/jira/browse/YARN-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Payne updated YARN-3769: - Attachment: YARN-3769.005.patch [~leftnoteasy], Attaching YARN-3769.005.patch with the changes we discussed. I have another question that may be an enhancement: In {{LeafQueue#getTotalPendingResourcesConsideringUserLimit}}, the calculation of headroom is as follows in this patch: {code} Resource headroom = Resources.subtract( computeUserLimit(app, resources, user, partition, SchedulingMode.RESPECT_PARTITION_EXCLUSIVITY), user.getUsed(partition)); {code} Would it be more efficient to just do the following? {code} Resource headroom = Resources.subtract(user.getUserResourceLimit(), user.getUsed()); {code} > Preemption occurring unnecessarily because preemption doesn't consider user > limit > - > > Key: YARN-3769 > URL: https://issues.apache.org/jira/browse/YARN-3769 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.6.0, 2.7.0, 2.8.0 >Reporter: Eric Payne >Assignee: Eric Payne > Attachments: YARN-3769-branch-2.002.patch, > YARN-3769-branch-2.7.002.patch, YARN-3769-branch-2.7.003.patch, > YARN-3769.001.branch-2.7.patch, YARN-3769.001.branch-2.8.patch, > YARN-3769.003.patch, YARN-3769.004.patch, YARN-3769.005.patch > > > We are seeing the preemption monitor preempting containers from queue A and > then seeing the capacity scheduler giving them immediately back to queue A. > This happens quite often and causes a lot of churn. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4234) New put APIs in TimelineClient for ats v1.5
[ https://issues.apache.org/jira/browse/YARN-4234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15001621#comment-15001621 ] Naganarasimha G R commented on YARN-4234: - Thanks [~gtCarrera] it would be helpful if you can also verify {{yarn.timeline-service.enabled}} is a client only config as explained in the documentation. From My side apart from RM's SystemMetricPublisher using it, i dont see it anyother code in the server using it. And any thoughts about the approach for getting the Version and related conf from the server side ? Also i have commented the same in YARN-4183. > New put APIs in TimelineClient for ats v1.5 > --- > > Key: YARN-4234 > URL: https://issues.apache.org/jira/browse/YARN-4234 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-4234-2015.2.patch, YARN-4234.1.patch, > YARN-4234.2.patch, YARN-4234.20151109.patch, YARN-4234.20151110.1.patch, > YARN-4234.2015.1.patch, YARN-4234.3.patch > > > In this ticket, we will add new put APIs in timelineClient to let > clients/applications have the option to use ATS v1.5 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2480) DockerContainerExecutor must support user namespaces
[ https://issues.apache.org/jira/browse/YARN-2480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15001610#comment-15001610 ] Erik Weathers commented on YARN-2480: - Experimental support for user-namespaces in Docker has landed in version 1.9: * http://integratedcode.us/2015/10/13/user-namespaces-have-arrived-in-docker/ * https://github.com/docker/docker/pull/12648 It's available via experimental builds (not in the normal 1.9 build), see this link for info on getting those builds: * https://github.com/docker/docker/tree/master/experimental Documentation on the feature: * https://github.com/docker/docker/blob/3b5fac462d21ca164b3778647420016315289034/experimental/userns.md > DockerContainerExecutor must support user namespaces > > > Key: YARN-2480 > URL: https://issues.apache.org/jira/browse/YARN-2480 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Abin Shahab > Labels: security > > When DockerContainerExector launches a container, the root inside that > container has root privileges on the host. > This is insecure in a mult-tenant environment. The uid of the container's > root user must be mapped to a non-privileged user on the host. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4183) Enabling generic application history forces every job to get a timeline service delegation token
[ https://issues.apache.org/jira/browse/YARN-4183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15001607#comment-15001607 ] Naganarasimha G R commented on YARN-4183: - Hi All, Lately observed there is already a existing configuration {{"yarn.timeline-service.client.best-effort"}} which ensures even YARNClient doesnt throw exception even if delegation token fetching fails in {{YARNClient.SubmitApplication}}. So i feel there is sufficient guard in the client side when to create Timelineclient and fetch the delegation token and if fail what action to be taken, So as part of this jira, as mentioned by [~vinodkv] i think we need to just remove the check for {{yarn.timeline-service.enable}} being used in {{SystemMetricsPublisher}} as its a client side configuration (correct me if i am wrong). And as part of YARN-4234 we are anyway concentrating on how to support for multiple versions in TimelineClient. right ? > Enabling generic application history forces every job to get a timeline > service delegation token > > > Key: YARN-4183 > URL: https://issues.apache.org/jira/browse/YARN-4183 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.1 >Reporter: Mit Desai >Assignee: Mit Desai > Attachments: YARN-4183.1.patch > > > When enabling just the Generic History Server and not the timeline server, > the system metrics publisher will not publish the events to the timeline > store as it checks if the timeline server and system metrics publisher are > enabled before creating a timeline client. > To make it work, if the timeline service flag is turned on, it will force > every yarn application to get a delegation token. > Instead of checking if timeline service is enabled, we should be checking if > application history server is enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4183) Enabling generic application history forces every job to get a timeline service delegation token
[ https://issues.apache.org/jira/browse/YARN-4183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15001581#comment-15001581 ] Hudson commented on YARN-4183: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #598 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/598/]) YARN-4183. Reverting the patch to fix behaviour change. Revert (vinodkv: rev 6351d3fa638f1d901279cef9e56dc4e07ef3de11) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java > Enabling generic application history forces every job to get a timeline > service delegation token > > > Key: YARN-4183 > URL: https://issues.apache.org/jira/browse/YARN-4183 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.1 >Reporter: Mit Desai >Assignee: Mit Desai > Attachments: YARN-4183.1.patch > > > When enabling just the Generic History Server and not the timeline server, > the system metrics publisher will not publish the events to the timeline > store as it checks if the timeline server and system metrics publisher are > enabled before creating a timeline client. > To make it work, if the timeline service flag is turned on, it will force > every yarn application to get a delegation token. > Instead of checking if timeline service is enabled, we should be checking if > application history server is enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4350) TestDistributedShell fails
[ https://issues.apache.org/jira/browse/YARN-4350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15001491#comment-15001491 ] Naganarasimha G R commented on YARN-4350: - Thanks [~sjlee0] for pointing this issue out, will take a look at it !. > TestDistributedShell fails > -- > > Key: YARN-4350 > URL: https://issues.apache.org/jira/browse/YARN-4350 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Naganarasimha G R > > Currently TestDistributedShell does not pass on the feature-YARN-2928 branch. > There seem to be 2 distinct issues. > (1) testDSShellWithoutDomainV2* tests fail sporadically > These test fail more often than not if tested by themselves: > {noformat} > testDSShellWithoutDomainV2DefaultFlow(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) > Time elapsed: 30.998 sec <<< FAILURE! > java.lang.AssertionError: Application created event should be published > atleast once expected:<1> but was:<0> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.checkTimelineV2(TestDistributedShell.java:451) > at > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:326) > at > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithoutDomainV2DefaultFlow(TestDistributedShell.java:207) > {noformat} > They start happening after YARN-4129. I suspect this might have to do with > some timing issue. > (2) the whole test times out > If you run the whole TestDistributedShell test, it times out without fail. > This may or may not have to do with the port change introduced by YARN-2859 > (just a hunch). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-4350) TestDistributedShell fails
[ https://issues.apache.org/jira/browse/YARN-4350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R reassigned YARN-4350: --- Assignee: Naganarasimha G R > TestDistributedShell fails > -- > > Key: YARN-4350 > URL: https://issues.apache.org/jira/browse/YARN-4350 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Naganarasimha G R > > Currently TestDistributedShell does not pass on the feature-YARN-2928 branch. > There seem to be 2 distinct issues. > (1) testDSShellWithoutDomainV2* tests fail sporadically > These test fail more often than not if tested by themselves: > {noformat} > testDSShellWithoutDomainV2DefaultFlow(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) > Time elapsed: 30.998 sec <<< FAILURE! > java.lang.AssertionError: Application created event should be published > atleast once expected:<1> but was:<0> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.checkTimelineV2(TestDistributedShell.java:451) > at > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:326) > at > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithoutDomainV2DefaultFlow(TestDistributedShell.java:207) > {noformat} > They start happening after YARN-4129. I suspect this might have to do with > some timing issue. > (2) the whole test times out > If you run the whole TestDistributedShell test, it times out without fail. > This may or may not have to do with the port change introduced by YARN-2859 > (just a hunch). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4350) TestDistributedShell fails
Sangjin Lee created YARN-4350: - Summary: TestDistributedShell fails Key: YARN-4350 URL: https://issues.apache.org/jira/browse/YARN-4350 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Sangjin Lee Currently TestDistributedShell does not pass on the feature-YARN-2928 branch. There seem to be 2 distinct issues. (1) testDSShellWithoutDomainV2* tests fail sporadically These test fail more often than not if tested by themselves: {noformat} testDSShellWithoutDomainV2DefaultFlow(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) Time elapsed: 30.998 sec <<< FAILURE! java.lang.AssertionError: Application created event should be published atleast once expected:<1> but was:<0> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.checkTimelineV2(TestDistributedShell.java:451) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:326) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithoutDomainV2DefaultFlow(TestDistributedShell.java:207) {noformat} They start happening after YARN-4129. I suspect this might have to do with some timing issue. (2) the whole test times out If you run the whole TestDistributedShell test, it times out without fail. This may or may not have to do with the port change introduced by YARN-2859 (just a hunch). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3862) Decide which contents to retrieve and send back in response in TimelineReader
[ https://issues.apache.org/jira/browse/YARN-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15001484#comment-15001484 ] Hadoop QA commented on YARN-3862: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 9s {color} | {color:blue} docker + precommit patch detected. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 5 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 10m 5s {color} | {color:green} feature-YARN-2928 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s {color} | {color:green} feature-YARN-2928 passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 21s {color} | {color:green} feature-YARN-2928 passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 16s {color} | {color:green} feature-YARN-2928 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 27s {color} | {color:green} feature-YARN-2928 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 50s {color} | {color:green} feature-YARN-2928 passed {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 27s {color} | {color:red} hadoop-yarn-server-timelineservice in feature-YARN-2928 failed with JDK v1.8.0_60. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s {color} | {color:green} feature-YARN-2928 passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 27s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s {color} | {color:green} the patch passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 25s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 21s {color} | {color:green} the patch passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 21s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 15s {color} | {color:red} Patch generated 43 new checkstyle issues in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice (total was 102, now 128). {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 20s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 1s {color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 1s {color} | {color:red} The patch has 1 line(s) with tabs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 55s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 24s {color} | {color:red} hadoop-yarn-server-timelineservice in the patch failed with JDK v1.8.0_60. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s {color} | {color:green} the patch passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m 49s {color} | {color:green} hadoop-yarn-server-timelineservice in the patch passed with JDK v1.8.0_60. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m 25s {color} | {color:green} hadoop-yarn-server-timelineservice in the patch passed with JDK v1.7.0_79. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 33s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 28m 5s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=1.7.1 Server=1.7.1 Image:test-patch-base-hadoop-date2015-11-12 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12771888/YARN-3862-feature-YARN-2928.wip.03.patch | | JIRA Issue | YARN-3862 | | Optional Tests | asflicense javac javadoc mvninstall unit findbugs checkstyle compile | | uname | Linux 54f9ecd19690 3
[jira] [Commented] (YARN-4234) New put APIs in TimelineClient for ats v1.5
[ https://issues.apache.org/jira/browse/YARN-4234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15001476#comment-15001476 ] Li Lu commented on YARN-4234: - Hi [~Naganarasimha], I'm not extremely familiar with the delegation token stuff in ATS v1, but probably you'd like to mention your comments here in YARN-4183? Also, I'd like to double check if {{yarn.timeline-service.enabled}} is a client only config? > New put APIs in TimelineClient for ats v1.5 > --- > > Key: YARN-4234 > URL: https://issues.apache.org/jira/browse/YARN-4234 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-4234-2015.2.patch, YARN-4234.1.patch, > YARN-4234.2.patch, YARN-4234.20151109.patch, YARN-4234.20151110.1.patch, > YARN-4234.2015.1.patch, YARN-4234.3.patch > > > In this ticket, we will add new put APIs in timelineClient to let > clients/applications have the option to use ATS v1.5 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4349) Support CallerContext in YARN
[ https://issues.apache.org/jira/browse/YARN-4349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15001472#comment-15001472 ] Mingliang Liu commented on YARN-4349: - Thanks for working on this, [~leftnoteasy]. Minor comment: {code} + CallerContext context = + new CallerContext(new CallerContext.Builder(pbContext.getContext()) + .setSignature(pbContext.getSignature().toByteArray())); {code} I think this code can be simplified as: {code} + CallerContext context = + new CallerContext.Builder(pbContext.getContext()) + .setSignature(pbContext.getSignature().toByteArray()).build(); {code} Is it a good idea to make the constructor {{CallerContext#CallerContext(Builder)}} private? > Support CallerContext in YARN > - > > Key: YARN-4349 > URL: https://issues.apache.org/jira/browse/YARN-4349 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-4349.1.patch > > > More details about CallerContext please refer to description of HDFS-9184. > From YARN's perspective, we should make following changes: > - RMAuditLogger logs application's caller context when application submit by > user > - Add caller context to application's data in ATS along with application > creation event > From MR's perspective: > - Set AppMaster container's context to YARN's application Id > - Set Mapper/Reducer containers' context to task attempt id > Protocol and RPC changes are done in HDFS-9184. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3862) Decide which contents to retrieve and send back in response in TimelineReader
[ https://issues.apache.org/jira/browse/YARN-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15001470#comment-15001470 ] Hadoop QA commented on YARN-3862: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s {color} | {color:blue} docker + precommit patch detected. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 5 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 56s {color} | {color:green} feature-YARN-2928 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s {color} | {color:green} feature-YARN-2928 passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s {color} | {color:green} feature-YARN-2928 passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 16s {color} | {color:green} feature-YARN-2928 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 23s {color} | {color:green} feature-YARN-2928 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 47s {color} | {color:green} feature-YARN-2928 passed {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 27s {color} | {color:red} hadoop-yarn-server-timelineservice in feature-YARN-2928 failed with JDK v1.8.0_60. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s {color} | {color:green} feature-YARN-2928 passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 26s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s {color} | {color:green} the patch passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 23s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s {color} | {color:green} the patch passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 20s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 15s {color} | {color:red} Patch generated 43 new checkstyle issues in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice (total was 102, now 128). {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 21s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s {color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s {color} | {color:red} The patch has 1 line(s) with tabs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 55s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 24s {color} | {color:red} hadoop-yarn-server-timelineservice in the patch failed with JDK v1.8.0_60. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s {color} | {color:green} the patch passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m 55s {color} | {color:green} hadoop-yarn-server-timelineservice in the patch passed with JDK v1.8.0_60. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m 22s {color} | {color:green} hadoop-yarn-server-timelineservice in the patch passed with JDK v1.7.0_79. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 32s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 27m 49s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=1.7.1 Server=1.7.1 Image:test-patch-base-hadoop-date2015-11-12 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12771888/YARN-3862-feature-YARN-2928.wip.03.patch | | JIRA Issue | YARN-3862 | | Optional Tests | asflicense javac javadoc mvninstall unit findbugs checkstyle compile | | uname | Linux 530c24bb2676
[jira] [Commented] (YARN-4347) Resource manager fails with Null pointer exception
[ https://issues.apache.org/jira/browse/YARN-4347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15001467#comment-15001467 ] Jian He commented on YARN-4347: --- Thanks for reviewing, I fixed the space, I think it's clear enough without the assertion msg, as code comments are there. > Resource manager fails with Null pointer exception > -- > > Key: YARN-4347 > URL: https://issues.apache.org/jira/browse/YARN-4347 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Yesha Vora >Assignee: Jian He > Attachments: YARN-4347.1.patch, YARN-4347.2.patch > > > Resource manager fails with NPE while trying to load or recover a finished > application. > {code} > 2015-11-11 17:53:22,351 ERROR resourcemanager.ResourceManager > (ResourceManager.java:serviceStart(597)) - Failed to load/recover state > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:746) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1155) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:116) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1037) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1001) > at > org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:755) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recoverAppAttempts(RMAppImpl.java:839) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.access$1900(RMAppImpl.java:102) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:854) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:844) > at > org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:719) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:313) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:411) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1219) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:593) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1026) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1067) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1063) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4347) Resource manager fails with Null pointer exception
[ https://issues.apache.org/jira/browse/YARN-4347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-4347: -- Attachment: YARN-4347.2.patch > Resource manager fails with Null pointer exception > -- > > Key: YARN-4347 > URL: https://issues.apache.org/jira/browse/YARN-4347 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Yesha Vora >Assignee: Jian He > Attachments: YARN-4347.1.patch, YARN-4347.2.patch > > > Resource manager fails with NPE while trying to load or recover a finished > application. > {code} > 2015-11-11 17:53:22,351 ERROR resourcemanager.ResourceManager > (ResourceManager.java:serviceStart(597)) - Failed to load/recover state > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:746) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1155) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:116) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1037) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1001) > at > org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:755) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recoverAppAttempts(RMAppImpl.java:839) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.access$1900(RMAppImpl.java:102) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:854) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:844) > at > org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:719) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:313) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:411) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1219) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:593) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1026) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1067) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1063) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4234) New put APIs in TimelineClient for ats v1.5
[ https://issues.apache.org/jira/browse/YARN-4234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15001455#comment-15001455 ] Naganarasimha G R commented on YARN-4234: - Hi [~jlowe] & [~gtCarrera], AFAIK, YARN-4183 wanted to address lil more than that what is captured by [~gtCarrera] (related to whether client should fail when timelineserver is not reachable). But what i dont understand(or missing), as per the current code (ATSv1) is, only if the *"yarn.timeline-service.enabled"* is enabled only then the delegation tokens are created automatically in the YARNClientImpl. So not sure why [~jeagles] was pointing out all apps were trying to get delegation tokens in YARN-4183. I understand SystemMetric Publisher (Server) need not check for this configuration but not sure why client should depend on some server side configuration? Additionally we already have client side configuration *"yarn.timeline-service.client.best-effort"* (realized it now) if when configured will not fail the {{YARNClient.SubmitApplication}} if fail to get the delegation tokens. Hope My understanding about the problem is correct, If correct then as part of YARN-4183 we need to just remove the check for *"yarn.timeline-service.enabled"* being used in SystemMetricsPublisher, Please correct me if i am wrong. And coming to this patch(support for 1.5) i can envisage it as follows: * Server will be configured with *"TIMELINE_SERVICE_VERSION"* based on which appropriate timeline handler is selected * Client apps who ever want to communicate with Timeline server will enable *"yarn.timeline-service.enabled"* * If security and *"yarn.timeline-service.enabled"* are enabled then delegation token is got in the yarn client as earlier * when the timeline client is intitialized it contacts server for the version and related configs and once it receives it initializes itself. * If user tries to use invalid methods (not appropriate to the server timeline version) then timelineclient throws exception Thoughts ? > New put APIs in TimelineClient for ats v1.5 > --- > > Key: YARN-4234 > URL: https://issues.apache.org/jira/browse/YARN-4234 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-4234-2015.2.patch, YARN-4234.1.patch, > YARN-4234.2.patch, YARN-4234.20151109.patch, YARN-4234.20151110.1.patch, > YARN-4234.2015.1.patch, YARN-4234.3.patch > > > In this ticket, we will add new put APIs in timelineClient to let > clients/applications have the option to use ATS v1.5 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4347) Resource manager fails with Null pointer exception
[ https://issues.apache.org/jira/browse/YARN-4347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15001454#comment-15001454 ] Hadoop QA commented on YARN-4347: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 13s {color} | {color:blue} docker + precommit patch detected. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 40s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s {color} | {color:green} trunk passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 18s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 24s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s {color} | {color:green} trunk passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 33s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 29s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s {color} | {color:green} the patch passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 27s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 13s {color} | {color:red} Patch generated 3 new checkstyle issues in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager (total was 237, now 239). {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 17s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 37s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s {color} | {color:green} the patch passed with JDK v1.7.0_79 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 68m 23s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 68m 41s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_79. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 27s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 151m 5s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_66 Failed junit tests | hadoop.yarn.server.resourcemanager.TestRMRestart | | | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | JDK v1.7.0_79 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | \\ \\ || Subsystem || Report/Notes || | Docker | Client=1.7.0 Server=1.7.0 Image:test-patch-base-hadoop-date2015-11-11 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12771854/YARN-4347.1.patch | | JIRA Issue | YARN-4347 | | Optional Tests | asflicense javac javadoc mvninstall unit findbugs checkstyle compile | | uname | Lin
[jira] [Updated] (YARN-4234) New put APIs in TimelineClient for ats v1.5
[ https://issues.apache.org/jira/browse/YARN-4234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-4234: Attachment: YARN-4234-2015.2.patch Removed TimelineEntityGroupIdProto and TimelineEntityGroupIdPBImpl > New put APIs in TimelineClient for ats v1.5 > --- > > Key: YARN-4234 > URL: https://issues.apache.org/jira/browse/YARN-4234 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-4234-2015.2.patch, YARN-4234.1.patch, > YARN-4234.2.patch, YARN-4234.20151109.patch, YARN-4234.20151110.1.patch, > YARN-4234.2015.1.patch, YARN-4234.3.patch > > > In this ticket, we will add new put APIs in timelineClient to let > clients/applications have the option to use ATS v1.5 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3862) Decide which contents to retrieve and send back in response in TimelineReader
[ https://issues.apache.org/jira/browse/YARN-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-3862: -- Attachment: YARN-3862-feature-YARN-2928.wip.03.patch Testing the new feature branch name (feature-YARN-2928). > Decide which contents to retrieve and send back in response in TimelineReader > - > > Key: YARN-3862 > URL: https://issues.apache.org/jira/browse/YARN-3862 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-3862-YARN-2928.wip.01.patch, > YARN-3862-YARN-2928.wip.02.patch, YARN-3862-YARN-2928.wip.03.patch, > YARN-3862-feature-YARN-2928.wip.03.patch > > > Currently, we will retrieve all the contents of the field if that field is > specified in the query API. In case of configs and metrics, this can become a > lot of data even though the user doesn't need it. So we need to provide a way > to query only a set of configs or metrics. > As a comma spearated list of configs/metrics to be returned will be quite > cumbersome to specify, we have to support either of the following options : > # Prefix match > # Regex > # Group the configs/metrics and query that group. > We also need a facility to specify a metric time window to return metrics in > a that window. This may be useful in plotting graphs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4349) Support CallerContext in YARN
[ https://issues.apache.org/jira/browse/YARN-4349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-4349: - Description: More details about CallerContext please refer to description of HDFS-9184. >From YARN's perspective, we should make following changes: - RMAuditLogger logs application's caller context when application submit by user - Add caller context to application's data in ATS along with application creation event >From MR's perspective: - Set AppMaster container's context to YARN's application Id - Set Mapper/Reducer containers' context to task attempt id Protocol and RPC changes are done in HDFS-9184. was: More details about CallerContext please refer to description of HDFS-9184. >From YARN's perspective, we should make following changes: - RMAuditLogger logs application's caller context when application submit by user - Add caller context to application's data in ATS along with application creation event Protocol and RPC changes are done in HDFS-9184. > Support CallerContext in YARN > - > > Key: YARN-4349 > URL: https://issues.apache.org/jira/browse/YARN-4349 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-4349.1.patch > > > More details about CallerContext please refer to description of HDFS-9184. > From YARN's perspective, we should make following changes: > - RMAuditLogger logs application's caller context when application submit by > user > - Add caller context to application's data in ATS along with application > creation event > From MR's perspective: > - Set AppMaster container's context to YARN's application Id > - Set Mapper/Reducer containers' context to task attempt id > Protocol and RPC changes are done in HDFS-9184. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4349) Support CallerContext in YARN
[ https://issues.apache.org/jira/browse/YARN-4349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-4349: - Attachment: YARN-4349.1.patch Attached initial patch for review. Included all changes in description. > Support CallerContext in YARN > - > > Key: YARN-4349 > URL: https://issues.apache.org/jira/browse/YARN-4349 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-4349.1.patch > > > More details about CallerContext please refer to description of HDFS-9184. > From YARN's perspective, we should make following changes: > - RMAuditLogger logs application's caller context when application submit by > user > - Add caller context to application's data in ATS along with application > creation event > Protocol and RPC changes are done in HDFS-9184. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Moved] (YARN-4349) Support CallerContext in YARN
[ https://issues.apache.org/jira/browse/YARN-4349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan moved HDFS-9418 to YARN-4349: Key: YARN-4349 (was: HDFS-9418) Project: Hadoop YARN (was: Hadoop HDFS) > Support CallerContext in YARN > - > > Key: YARN-4349 > URL: https://issues.apache.org/jira/browse/YARN-4349 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Wangda Tan >Assignee: Wangda Tan > > More details about CallerContext please refer to description of HDFS-9184. > From YARN's perspective, we should make following changes: > - RMAuditLogger logs application's caller context when application submit by > user > - Add caller context to application's data in ATS along with application > creation event > Protocol and RPC changes are done in HDFS-9184. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4183) Enabling generic application history forces every job to get a timeline service delegation token
[ https://issues.apache.org/jira/browse/YARN-4183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15001416#comment-15001416 ] Hudson commented on YARN-4183: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2536 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2536/]) YARN-4183. Reverting the patch to fix behaviour change. Revert (vinodkv: rev 6351d3fa638f1d901279cef9e56dc4e07ef3de11) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java > Enabling generic application history forces every job to get a timeline > service delegation token > > > Key: YARN-4183 > URL: https://issues.apache.org/jira/browse/YARN-4183 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.1 >Reporter: Mit Desai >Assignee: Mit Desai > Attachments: YARN-4183.1.patch > > > When enabling just the Generic History Server and not the timeline server, > the system metrics publisher will not publish the events to the timeline > store as it checks if the timeline server and system metrics publisher are > enabled before creating a timeline client. > To make it work, if the timeline service flag is turned on, it will force > every yarn application to get a delegation token. > Instead of checking if timeline service is enabled, we should be checking if > application history server is enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4348) ZKRMStateStore.syncInternal should wait for zkResyncWaitTime instead of zkSessionTimeout
[ https://issues.apache.org/jira/browse/YARN-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi Ozawa updated YARN-4348: - Attachment: YARN-4348.001.patch The issue looks similar to YARN-3753. One workaround is to change the timeout for the sync to zkResyncWaitTime as [~jianhe] changed on YARN-3753. Attaching a patch for this. If the timeout be increased, the probability of the case will be decreased, but it can still happen. e.g. ZK's server packet for the reply against sync is dropped after the operation itself success. > ZKRMStateStore.syncInternal should wait for zkResyncWaitTime instead of > zkSessionTimeout > > > Key: YARN-4348 > URL: https://issues.apache.org/jira/browse/YARN-4348 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.2, 2.6.2 >Reporter: Tsuyoshi Ozawa >Assignee: Tsuyoshi Ozawa > Attachments: YARN-4348.001.patch, log.txt > > > Jian mentioned that the current internal ZK configuration of ZKRMStateStore > can cause a following situation: > 1. syncInternal timeouts, > 2. but sync succeeded later on. > We should use zkResyncWaitTime as the timeout value. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4348) ZKRMStateStore.syncInternal should wait for zkResyncWaitTime instead of zkSessionTimeout
[ https://issues.apache.org/jira/browse/YARN-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi Ozawa updated YARN-4348: - Affects Version/s: 2.6.2 > ZKRMStateStore.syncInternal should wait for zkResyncWaitTime instead of > zkSessionTimeout > > > Key: YARN-4348 > URL: https://issues.apache.org/jira/browse/YARN-4348 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.2, 2.6.2 >Reporter: Tsuyoshi Ozawa >Assignee: Tsuyoshi Ozawa > Attachments: log.txt > > > Jian mentioned that the current internal ZK configuration of ZKRMStateStore > can cause a following situation: > 1. syncInternal timeouts, > 2. but sync succeeded later on. > We should use zkResyncWaitTime as the timeout value. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4348) ZKRMStateStore.syncInternal should wait for zkResyncWaitTime instead of zkSessionTimeout
[ https://issues.apache.org/jira/browse/YARN-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi Ozawa updated YARN-4348: - Attachment: log.txt Attaching a log file. > ZKRMStateStore.syncInternal should wait for zkResyncWaitTime instead of > zkSessionTimeout > > > Key: YARN-4348 > URL: https://issues.apache.org/jira/browse/YARN-4348 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.2 >Reporter: Tsuyoshi Ozawa >Assignee: Tsuyoshi Ozawa > Attachments: log.txt > > > The current internal ZK configuration of ZKRMStateStore can cause a following > situation: > 1. syncInternal timeouts, > 2. but sync succeeded later on. > We should use zkResyncWaitTime as the timeout value. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4348) ZKRMStateStore.syncInternal should wait for zkResyncWaitTime instead of zkSessionTimeout
[ https://issues.apache.org/jira/browse/YARN-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi Ozawa updated YARN-4348: - Description: Jian mentioned that the current internal ZK configuration of ZKRMStateStore can cause a following situation: 1. syncInternal timeouts, 2. but sync succeeded later on. We should use zkResyncWaitTime as the timeout value. was: The current internal ZK configuration of ZKRMStateStore can cause a following situation: 1. syncInternal timeouts, 2. but sync succeeded later on. We should use zkResyncWaitTime as the timeout value. > ZKRMStateStore.syncInternal should wait for zkResyncWaitTime instead of > zkSessionTimeout > > > Key: YARN-4348 > URL: https://issues.apache.org/jira/browse/YARN-4348 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.2 >Reporter: Tsuyoshi Ozawa >Assignee: Tsuyoshi Ozawa > Attachments: log.txt > > > Jian mentioned that the current internal ZK configuration of ZKRMStateStore > can cause a following situation: > 1. syncInternal timeouts, > 2. but sync succeeded later on. > We should use zkResyncWaitTime as the timeout value. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4348) ZKRMStateStore.syncInternal should wait for zkResyncWaitTime instead of zkSessionTimeout
[ https://issues.apache.org/jira/browse/YARN-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi Ozawa updated YARN-4348: - Description: The current internal ZK configuration of ZKRMStateStore can cause a following situation: 1. syncInternal timeouts, 2. but sync succeeded later on. We should use was: The current internal ZK configuration of ZKRMStateStore can cause a following situation: 1. syncInternal timeouts, 2. but sync succeeded later on. {quote} 2015-11-11 11:54:05,728 INFO recovery.ZKRMStateStore (ZKRMStateStore.java:runWithRetries(1241)) - Failed to sync with ZK new connection. -<--- sync failed 2015-11-11 11:54:05,728 INFO recovery.ZKRMStateStore (ZKRMStateStore.java:runWithRetries(1244)) - Maxed out ZK retries. Giving up! 2015-11-11 11:54:05,728 ERROR recovery.RMStateStore (RMStateStore.java:transition(292)) - Error updating appAttempt: appattempt_1447242474882_0002_01 org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /rmstore/ZKRMStateRoot/RMAppRoot/application_1447242474882_0002/appattempt_1447242474882_0002_01 at org.apache.zookeeper.KeeperException.create(KeeperException.java:127) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1045) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1073) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$9.run(ZKRMStateStore.java:1082) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$9.run(ZKRMStateStore.java:1079) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1164) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1197) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.existsWithRetries(ZKRMStateStore.java:1079) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:716) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:286) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:269) at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:1006) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:1075) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:1070) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:182) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108) at java.lang.Thread.run(Thread.java:745) 2015-11-11 11:54:05,729 ERROR recovery.RMStateStore (RMStateStore.java:notifyStoreOperationFailed(1027)) - State store operation failed org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /rmstore/ZKRMStateRoot/RMAppRoot/application_1447242474882_0002/appattempt_1447242474882_0002_01 at org.apache.zookeeper.KeeperException.create(KeeperException.java:127) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1045) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1073) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$9.run(ZKRMStateStore.java:1082) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$9.run(ZKRMStateStore.java:1079) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1164) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1197) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.existsWithRetries(ZKRMStateStore.java:1079) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:716) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:286) at org.apache.hadoop.yarn.server.resou
[jira] [Updated] (YARN-4348) ZKRMStateStore.syncInternal should wait for zkResyncWaitTime instead of zkSessionTimeout
[ https://issues.apache.org/jira/browse/YARN-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi Ozawa updated YARN-4348: - Description: The current internal ZK configuration of ZKRMStateStore can cause a following situation: 1. syncInternal timeouts, 2. but sync succeeded later on. We should use zkResyncWaitTime as the timeout value. was: The current internal ZK configuration of ZKRMStateStore can cause a following situation: 1. syncInternal timeouts, 2. but sync succeeded later on. We should use > ZKRMStateStore.syncInternal should wait for zkResyncWaitTime instead of > zkSessionTimeout > > > Key: YARN-4348 > URL: https://issues.apache.org/jira/browse/YARN-4348 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.2 >Reporter: Tsuyoshi Ozawa >Assignee: Tsuyoshi Ozawa > > The current internal ZK configuration of ZKRMStateStore can cause a following > situation: > 1. syncInternal timeouts, > 2. but sync succeeded later on. > We should use zkResyncWaitTime as the timeout value. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4348) ZKRMStateStore.syncInternal should wait for zkResyncWaitTime instead of zkSessionTimeout
Tsuyoshi Ozawa created YARN-4348: Summary: ZKRMStateStore.syncInternal should wait for zkResyncWaitTime instead of zkSessionTimeout Key: YARN-4348 URL: https://issues.apache.org/jira/browse/YARN-4348 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.2 Reporter: Tsuyoshi Ozawa Assignee: Tsuyoshi Ozawa The current internal ZK configuration of ZKRMStateStore can cause a following situation: 1. syncInternal timeouts, 2. but sync succeeded later on. {quote} 2015-11-11 11:54:05,728 INFO recovery.ZKRMStateStore (ZKRMStateStore.java:runWithRetries(1241)) - Failed to sync with ZK new connection. -<--- sync failed 2015-11-11 11:54:05,728 INFO recovery.ZKRMStateStore (ZKRMStateStore.java:runWithRetries(1244)) - Maxed out ZK retries. Giving up! 2015-11-11 11:54:05,728 ERROR recovery.RMStateStore (RMStateStore.java:transition(292)) - Error updating appAttempt: appattempt_1447242474882_0002_01 org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /rmstore/ZKRMStateRoot/RMAppRoot/application_1447242474882_0002/appattempt_1447242474882_0002_01 at org.apache.zookeeper.KeeperException.create(KeeperException.java:127) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1045) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1073) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$9.run(ZKRMStateStore.java:1082) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$9.run(ZKRMStateStore.java:1079) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1164) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1197) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.existsWithRetries(ZKRMStateStore.java:1079) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:716) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:286) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:269) at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:1006) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:1075) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:1070) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:182) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108) at java.lang.Thread.run(Thread.java:745) 2015-11-11 11:54:05,729 ERROR recovery.RMStateStore (RMStateStore.java:notifyStoreOperationFailed(1027)) - State store operation failed org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /rmstore/ZKRMStateRoot/RMAppRoot/application_1447242474882_0002/appattempt_1447242474882_0002_01 at org.apache.zookeeper.KeeperException.create(KeeperException.java:127) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1045) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1073) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$9.run(ZKRMStateStore.java:1082) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$9.run(ZKRMStateStore.java:1079) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1164) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1197) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.existsWithRetries(ZKRMStateStore.java:1079) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:716) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(R
[jira] [Commented] (YARN-4347) Resource manager fails with Null pointer exception
[ https://issues.apache.org/jira/browse/YARN-4347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15001381#comment-15001381 ] Daniel Templeton commented on YARN-4347: A couple of quibbles on the patch: {code} RMApp rmApp =appAttempt.rmContext.getRMApps().get( {code} needs a space on the other side of the equals. Also, your asserts in the test class should have a failure message a the first parameter. > Resource manager fails with Null pointer exception > -- > > Key: YARN-4347 > URL: https://issues.apache.org/jira/browse/YARN-4347 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Yesha Vora >Assignee: Jian He > Attachments: YARN-4347.1.patch > > > Resource manager fails with NPE while trying to load or recover a finished > application. > {code} > 2015-11-11 17:53:22,351 ERROR resourcemanager.ResourceManager > (ResourceManager.java:serviceStart(597)) - Failed to load/recover state > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:746) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1155) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:116) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1037) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1001) > at > org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:755) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recoverAppAttempts(RMAppImpl.java:839) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.access$1900(RMAppImpl.java:102) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:854) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:844) > at > org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:719) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:313) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:411) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1219) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:593) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1026) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1067) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1063) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4234) New put APIs in TimelineClient for ats v1.5
[ https://issues.apache.org/jira/browse/YARN-4234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15001380#comment-15001380 ] Li Lu commented on YARN-4234: - I think here are some miscommunications, please feel free to correct me if I'm wrong. Here are actually two problems we're discussing: - For ATS v1.5 entity write patch, the client should not contact the server. In this way we're offloading the traffic from the centralized server, which became a significant problem of ATS v1. - For ATS compatibility issue raised in YARN-4183, we need to come up a mechanism to let the clients and server coordinate. Specifically, the client may want to know the server's ATS version or where exactly it needs to write the timeline entities. Client side configurations may not be consistent with the server in this case. So, maybe we want some mechanisms to coordinate on this? Not sure if I got the points right but to me those two points are independent? > New put APIs in TimelineClient for ats v1.5 > --- > > Key: YARN-4234 > URL: https://issues.apache.org/jira/browse/YARN-4234 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-4234.1.patch, YARN-4234.2.patch, > YARN-4234.20151109.patch, YARN-4234.20151110.1.patch, > YARN-4234.2015.1.patch, YARN-4234.3.patch > > > In this ticket, we will add new put APIs in timelineClient to let > clients/applications have the option to use ATS v1.5 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4345) yarn rmadmin -updateNodeResource doesn't work
[ https://issues.apache.org/jira/browse/YARN-4345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15001364#comment-15001364 ] Hadoop QA commented on YARN-4345: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s {color} | {color:blue} docker + precommit patch detected. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 21s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 17s {color} | {color:green} trunk passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 17s {color} | {color:green} trunk passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 9s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 37s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s {color} | {color:green} trunk passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s {color} | {color:green} trunk passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 20s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 16s {color} | {color:green} the patch passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 16s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 18s {color} | {color:green} the patch passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 18s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 9s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 48s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s {color} | {color:green} the patch passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s {color} | {color:green} the patch passed with JDK v1.7.0_79 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 49m 33s {color} | {color:red} hadoop-yarn-client in the patch failed with JDK v1.8.0_60. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 49m 36s {color} | {color:red} hadoop-yarn-client in the patch failed with JDK v1.7.0_79. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 28s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 109m 11s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_60 Failed junit tests | hadoop.yarn.client.TestGetGroups | | JDK v1.8.0_60 Timed out junit tests | org.apache.hadoop.yarn.client.api.impl.TestYarnClient | | | org.apache.hadoop.yarn.client.api.impl.TestAMRMClient | | | org.apache.hadoop.yarn.client.api.impl.TestNMClient | | JDK v1.7.0_79 Failed junit tests | hadoop.yarn.client.TestGetGroups | | JDK v1.7.0_79 Timed out junit tests | org.apache.hadoop.yarn.client.api.impl.TestYarnClient | | | org.apache.hadoop.yarn.client.api.impl.TestAMRMClient | | | org.apache.hadoop.yarn.client.api.impl.TestNMClient | \\ \\ || Subsystem || Report/Notes || | Docker | Client=1.7.1 Server=1.7.1 Image:test-patch-base-hadoop-date2015-11-11 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12771848/YARN-4345-v2.patch | | JIRA Issue | YARN-4345 | | Optional Tests | asflicense javac javadoc mvninstall unit
[jira] [Commented] (YARN-4347) Resource manager fails with Null pointer exception
[ https://issues.apache.org/jira/browse/YARN-4347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15001359#comment-15001359 ] Wangda Tan commented on YARN-4347: -- +1, pending Jenkins. > Resource manager fails with Null pointer exception > -- > > Key: YARN-4347 > URL: https://issues.apache.org/jira/browse/YARN-4347 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Yesha Vora >Assignee: Jian He > Attachments: YARN-4347.1.patch > > > Resource manager fails with NPE while trying to load or recover a finished > application. > {code} > 2015-11-11 17:53:22,351 ERROR resourcemanager.ResourceManager > (ResourceManager.java:serviceStart(597)) - Failed to load/recover state > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:746) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1155) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:116) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1037) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1001) > at > org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:755) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recoverAppAttempts(RMAppImpl.java:839) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.access$1900(RMAppImpl.java:102) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:854) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:844) > at > org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:719) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:313) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:411) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1219) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:593) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1026) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1067) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1063) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4234) New put APIs in TimelineClient for ats v1.5
[ https://issues.apache.org/jira/browse/YARN-4234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15001334#comment-15001334 ] Jason Lowe commented on YARN-4234: -- One issue of having the client need to contact the server is it adds a dependency that ATS v1.5 is explicitly trying to remove. Today we are running jobs that work just fine whether the ATS is up or not. If the client needs to fetch anything from the ATS then jobs stop flowing as soon as the ATS is down. Since ATS v1 and v1.5 are not HA that's undesirable. It puts the ATS on the critical path for jobs flowing through the cluster. > New put APIs in TimelineClient for ats v1.5 > --- > > Key: YARN-4234 > URL: https://issues.apache.org/jira/browse/YARN-4234 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-4234.1.patch, YARN-4234.2.patch, > YARN-4234.20151109.patch, YARN-4234.20151110.1.patch, > YARN-4234.2015.1.patch, YARN-4234.3.patch > > > In this ticket, we will add new put APIs in timelineClient to let > clients/applications have the option to use ATS v1.5 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4234) New put APIs in TimelineClient for ats v1.5
[ https://issues.apache.org/jira/browse/YARN-4234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15001325#comment-15001325 ] Li Lu commented on YARN-4234: - Thanks [~Naganarasimha], this is a good suggestion. Maybe we can put those self-descriptive features in a restful endpoint (I'm just thinking out loud)? > New put APIs in TimelineClient for ats v1.5 > --- > > Key: YARN-4234 > URL: https://issues.apache.org/jira/browse/YARN-4234 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-4234.1.patch, YARN-4234.2.patch, > YARN-4234.20151109.patch, YARN-4234.20151110.1.patch, > YARN-4234.2015.1.patch, YARN-4234.3.patch > > > In this ticket, we will add new put APIs in timelineClient to let > clients/applications have the option to use ATS v1.5 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4234) New put APIs in TimelineClient for ats v1.5
[ https://issues.apache.org/jira/browse/YARN-4234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15001318#comment-15001318 ] Naganarasimha G R commented on YARN-4234: - Thanks [~gtCarrera], for updating offline about the approach of ATS 1.5, it seems to be different than existing approaches in V1 and V2. *Timeline client* is able to write directly to HDFS without the knowledge of ATS Server(Reader). But one small suggestion, Can the configuration information(like activePath) related to 1.5 be got from server rather than client doing it based on its local configuration, this avoid configurations conflicts? hope my understanding is right. correct me if i am wrong. > New put APIs in TimelineClient for ats v1.5 > --- > > Key: YARN-4234 > URL: https://issues.apache.org/jira/browse/YARN-4234 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-4234.1.patch, YARN-4234.2.patch, > YARN-4234.20151109.patch, YARN-4234.20151110.1.patch, > YARN-4234.2015.1.patch, YARN-4234.3.patch > > > In this ticket, we will add new put APIs in timelineClient to let > clients/applications have the option to use ATS v1.5 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4051) ContainerKillEvent is lost when container is In New State and is recovering
[ https://issues.apache.org/jira/browse/YARN-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15001309#comment-15001309 ] Jason Lowe commented on YARN-4051: -- Thanks for updating the patch! Should the value be infinite by default? The concern is that if one container has issues recovering (due to log aggregation woes or whatever) then we risk expiring all of the containers on this node if we don't re-register with the RM within the node expiry interval. I think it makes sense if we have also fixed the recovery paths so there aren't potentially long-running procedures (like contacting HDFS) during the recovery process. If we haven't then we could create as many problems as we're solving by waiting forever. Why does the patch change the check interval? If it's to reduce the logging then we can better fix that by only logging when the status changes rather than every iteration. Nit: A value of zero should also be treated as a disabled max time. Nit: "Max time to wait NM to complete container recover before register to RM " should be "Max time NM will wait to complete container recovery before registering with the RM". > ContainerKillEvent is lost when container is In New State and is recovering > > > Key: YARN-4051 > URL: https://issues.apache.org/jira/browse/YARN-4051 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: sandflee >Assignee: sandflee >Priority: Critical > Attachments: YARN-4051.01.patch, YARN-4051.02.patch, > YARN-4051.03.patch, YARN-4051.04.patch > > > As in YARN-4050, NM event dispatcher is blocked, and container is in New > state, when we finish application, the container still alive even after NM > event dispatcher is unblocked. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4347) Resource manager fails with Null pointer exception
[ https://issues.apache.org/jira/browse/YARN-4347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-4347: -- Attachment: YARN-4347.1.patch > Resource manager fails with Null pointer exception > -- > > Key: YARN-4347 > URL: https://issues.apache.org/jira/browse/YARN-4347 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Yesha Vora >Assignee: Jian He > Attachments: YARN-4347.1.patch > > > Resource manager fails with NPE while trying to load or recover a finished > application. > {code} > 2015-11-11 17:53:22,351 ERROR resourcemanager.ResourceManager > (ResourceManager.java:serviceStart(597)) - Failed to load/recover state > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:746) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1155) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:116) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1037) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1001) > at > org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:755) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recoverAppAttempts(RMAppImpl.java:839) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.access$1900(RMAppImpl.java:102) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:854) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:844) > at > org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:719) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:313) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:411) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1219) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:593) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1026) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1067) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1063) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4347) Resource manager fails with Null pointer exception
[ https://issues.apache.org/jira/browse/YARN-4347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15001277#comment-15001277 ] Jian He commented on YARN-4347: --- Thanks [~yeshavora] for reporting this. The issue here is that, attempt final state was previously not recorded but app final state was previously recoded. So, when it recovers, NPE throws. Working on a patch to fix the attempt recovery path - If the corresponding app's recovered state is at final state, return the attempt final state accordingly, instead of adding that into scheduler again. > Resource manager fails with Null pointer exception > -- > > Key: YARN-4347 > URL: https://issues.apache.org/jira/browse/YARN-4347 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Yesha Vora >Assignee: Jian He > > Resource manager fails with NPE while trying to load or recover a finished > application. > {code} > 2015-11-11 17:53:22,351 ERROR resourcemanager.ResourceManager > (ResourceManager.java:serviceStart(597)) - Failed to load/recover state > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:746) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1155) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:116) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1037) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1001) > at > org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:755) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recoverAppAttempts(RMAppImpl.java:839) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.access$1900(RMAppImpl.java:102) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:854) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:844) > at > org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:719) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:313) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:411) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1219) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:593) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1026) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1067) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1063) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4132) Nodemanagers should try harder to connect to the RM
[ https://issues.apache.org/jira/browse/YARN-4132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15001278#comment-15001278 ] Jason Lowe commented on YARN-4132: -- Patch doesn't build. Other comments: Do we need to get rid of the old RMProxy.createRetryPolicy(conf) method? We can just leave that around to reduce code churn on the patch and have it implemented in terms of the new createRetryPolicy method. Similarly for maximum backwards compatibility we could leave around an RMProxy.createRMProxy method that doesn't take the extra values and calls the old createRetryPolicy method. A new private utility method, like createRMProxy(conf, protocol, instance, retryPolicy) could be added to factor out the common code between the two createRMProxy methods. "to connection to RM" should be "to connect to the RM" in the property descriptions in yarn-default.xml. > Nodemanagers should try harder to connect to the RM > --- > > Key: YARN-4132 > URL: https://issues.apache.org/jira/browse/YARN-4132 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Chang Li >Assignee: Chang Li > Attachments: YARN-4132.2.patch, YARN-4132.3.patch, YARN-4132.4.patch, > YARN-4132.5.patch, YARN-4132.patch > > > Being part of the cluster, nodemanagers should try very hard (and possibly > never give up) to connect to a resourcemanager. Minimally we should have a > separate config to set how aggressively a nodemanager will connect to the RM > separate from what clients will do. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-4347) Resource manager fails with Null pointer exception
[ https://issues.apache.org/jira/browse/YARN-4347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He reassigned YARN-4347: - Assignee: Jian He > Resource manager fails with Null pointer exception > -- > > Key: YARN-4347 > URL: https://issues.apache.org/jira/browse/YARN-4347 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Yesha Vora >Assignee: Jian He > > Resource manager fails with NPE while trying to load or recover a finished > application. > {code} > 2015-11-11 17:53:22,351 ERROR resourcemanager.ResourceManager > (ResourceManager.java:serviceStart(597)) - Failed to load/recover state > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:746) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1155) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:116) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1037) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1001) > at > org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:755) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recoverAppAttempts(RMAppImpl.java:839) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.access$1900(RMAppImpl.java:102) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:854) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:844) > at > org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:719) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:313) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:411) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1219) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:593) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1026) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1067) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1063) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4347) Resource manager fails with Null pointer exception
Yesha Vora created YARN-4347: Summary: Resource manager fails with Null pointer exception Key: YARN-4347 URL: https://issues.apache.org/jira/browse/YARN-4347 Project: Hadoop YARN Issue Type: Bug Components: yarn Reporter: Yesha Vora Resource manager fails with NPE while trying to load or recover a finished application. {code} 2015-11-11 17:53:22,351 ERROR resourcemanager.ResourceManager (ResourceManager.java:serviceStart(597)) - Failed to load/recover state java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:746) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1155) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:116) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1037) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1001) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:755) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recoverAppAttempts(RMAppImpl.java:839) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.access$1900(RMAppImpl.java:102) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:854) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:844) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:719) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:313) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:411) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1219) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:593) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1026) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1067) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1063) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4183) Enabling generic application history forces every job to get a timeline service delegation token
[ https://issues.apache.org/jira/browse/YARN-4183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15001239#comment-15001239 ] Hudson commented on YARN-4183: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #658 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/658/]) YARN-4183. Reverting the patch to fix behaviour change. Revert (vinodkv: rev 6351d3fa638f1d901279cef9e56dc4e07ef3de11) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java > Enabling generic application history forces every job to get a timeline > service delegation token > > > Key: YARN-4183 > URL: https://issues.apache.org/jira/browse/YARN-4183 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.1 >Reporter: Mit Desai >Assignee: Mit Desai > Attachments: YARN-4183.1.patch > > > When enabling just the Generic History Server and not the timeline server, > the system metrics publisher will not publish the events to the timeline > store as it checks if the timeline server and system metrics publisher are > enabled before creating a timeline client. > To make it work, if the timeline service flag is turned on, it will force > every yarn application to get a delegation token. > Instead of checking if timeline service is enabled, we should be checking if > application history server is enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4234) New put APIs in TimelineClient for ats v1.5
[ https://issues.apache.org/jira/browse/YARN-4234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15001224#comment-15001224 ] Li Lu commented on YARN-4234: - Well let me clarify one thing: ATS v1.5 supports all ATS v1 features, while ATS v2 is not compatible with ATS v1 and 1.5. Therefore, how about this: timeline-service.version: 1 -> only ATS v1 features are supported. timeline-service.version: 1.5 -> supports ATS v1 and v1.5 features. timeline-service.version: 2 -> only ATS v2 features are supported. This describes the feature set on the server side of ATS. The client can use this information from server to perform sanity check. I agree with [~Naganarasimha] that checking this config on the client side does not quite make sense, and clearly we need some mechanisms to let the client know the current running version of ATS. However, I think this is orthogonal to the config key proposed in YARN-4183 (and added here). > New put APIs in TimelineClient for ats v1.5 > --- > > Key: YARN-4234 > URL: https://issues.apache.org/jira/browse/YARN-4234 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-4234.1.patch, YARN-4234.2.patch, > YARN-4234.20151109.patch, YARN-4234.20151110.1.patch, > YARN-4234.2015.1.patch, YARN-4234.3.patch > > > In this ticket, we will add new put APIs in timelineClient to let > clients/applications have the option to use ATS v1.5 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4345) yarn rmadmin -updateNodeResource doesn't work
[ https://issues.apache.org/jira/browse/YARN-4345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15001207#comment-15001207 ] Junping Du commented on YARN-4345: -- Thanks Jason for review! I update v2 patch to address your comments. > yarn rmadmin -updateNodeResource doesn't work > - > > Key: YARN-4345 > URL: https://issues.apache.org/jira/browse/YARN-4345 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Sushmitha Sreenivasan >Assignee: Junping Du >Priority: Critical > Attachments: YARN-4345-v2.patch, YARN-4345.patch > > > YARN-313 add CLI to update node resource. It works fine for batch mode > update. However, for single node update "yarn rmadmin -updateNodeResource" > failed to work because resource is not set properly in sending request. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4345) yarn rmadmin -updateNodeResource doesn't work
[ https://issues.apache.org/jira/browse/YARN-4345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-4345: - Attachment: YARN-4345-v2.patch > yarn rmadmin -updateNodeResource doesn't work > - > > Key: YARN-4345 > URL: https://issues.apache.org/jira/browse/YARN-4345 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Sushmitha Sreenivasan >Assignee: Junping Du >Priority: Critical > Attachments: YARN-4345-v2.patch, YARN-4345.patch > > > YARN-313 add CLI to update node resource. It works fine for batch mode > update. However, for single node update "yarn rmadmin -updateNodeResource" > failed to work because resource is not set properly in sending request. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4234) New put APIs in TimelineClient for ats v1.5
[ https://issues.apache.org/jira/browse/YARN-4234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15001202#comment-15001202 ] Li Lu commented on YARN-4234: - [~Naganarasimha] Sure, we can certainly add those extra sanity checks on the client. However, I believe anyways we need this config on the server side to describe the highest supported version number. > New put APIs in TimelineClient for ats v1.5 > --- > > Key: YARN-4234 > URL: https://issues.apache.org/jira/browse/YARN-4234 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-4234.1.patch, YARN-4234.2.patch, > YARN-4234.20151109.patch, YARN-4234.20151110.1.patch, > YARN-4234.2015.1.patch, YARN-4234.3.patch > > > In this ticket, we will add new put APIs in timelineClient to let > clients/applications have the option to use ATS v1.5 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4345) yarn rmadmin -updateNodeResource doesn't work
[ https://issues.apache.org/jira/browse/YARN-4345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-4345: - Target Version/s: 2.7.3 (was: 2.8.0) > yarn rmadmin -updateNodeResource doesn't work > - > > Key: YARN-4345 > URL: https://issues.apache.org/jira/browse/YARN-4345 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Sushmitha Sreenivasan >Assignee: Junping Du >Priority: Critical > Attachments: YARN-4345.patch > > > YARN-313 add CLI to update node resource. It works fine for batch mode > update. However, for single node update "yarn rmadmin -updateNodeResource" > failed to work because resource is not set properly in sending request. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4345) yarn rmadmin -updateNodeResource doesn't work
[ https://issues.apache.org/jira/browse/YARN-4345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15001192#comment-15001192 ] Junping Du commented on YARN-4345: -- I think the test failure is not related. Try the test locally (TestYarnClient) which is failed with or without this patch. > yarn rmadmin -updateNodeResource doesn't work > - > > Key: YARN-4345 > URL: https://issues.apache.org/jira/browse/YARN-4345 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Sushmitha Sreenivasan >Assignee: Junping Du >Priority: Critical > Attachments: YARN-4345.patch > > > YARN-313 add CLI to update node resource. It works fine for batch mode > update. However, for single node update "yarn rmadmin -updateNodeResource" > failed to work because resource is not set properly in sending request. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4345) yarn rmadmin -updateNodeResource doesn't work
[ https://issues.apache.org/jira/browse/YARN-4345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15001187#comment-15001187 ] Jason Lowe commented on YARN-4345: -- Thanks for the patch, Junping! The main change looks OK, but the unit test needs some work. The unit test is missing the test decorator and therefore isn't executing. With that fixed, the unit test is passing even without the main fix. > yarn rmadmin -updateNodeResource doesn't work > - > > Key: YARN-4345 > URL: https://issues.apache.org/jira/browse/YARN-4345 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Sushmitha Sreenivasan >Assignee: Junping Du >Priority: Critical > Attachments: YARN-4345.patch > > > YARN-313 add CLI to update node resource. It works fine for batch mode > update. However, for single node update "yarn rmadmin -updateNodeResource" > failed to work because resource is not set properly in sending request. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4234) New put APIs in TimelineClient for ats v1.5
[ https://issues.apache.org/jira/browse/YARN-4234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15001171#comment-15001171 ] Naganarasimha G R commented on YARN-4234: - Hi [~gtCarrera], IIUC YARN-3488 is talking about if this server side configuration is configured differently in the client then though the user calls the right API's based on the client side ATS version, it will not be successfull in the server. So was wondering while starting of the timelineclient whether we can connect to the backend timeline server and check whether the server is for the specified ATS version in the client, or as YARN-3488 suggests receive the information related to the ATS version as part of RM register.Thoughts ? > New put APIs in TimelineClient for ats v1.5 > --- > > Key: YARN-4234 > URL: https://issues.apache.org/jira/browse/YARN-4234 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-4234.1.patch, YARN-4234.2.patch, > YARN-4234.20151109.patch, YARN-4234.20151110.1.patch, > YARN-4234.2015.1.patch, YARN-4234.3.patch > > > In this ticket, we will add new put APIs in timelineClient to let > clients/applications have the option to use ATS v1.5 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4265) Provide new timeline plugin storage to support fine-grained entity caching
[ https://issues.apache.org/jira/browse/YARN-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15001155#comment-15001155 ] Li Lu commented on YARN-4265: - Thanks [~jlowe]! Yes the current POC version is heavily based on YARN-3942. I'm doing some refactoring right now but the general logic should be pretty similar to your logic (thank for the work! ). I'll address your comments in the next version of the patch. bq. If no plugins are configured (which is the default behavior), do we want a fallback plugin that emulates what YARN-3942 is doing? That's a nice suggestion. We can build a plugin that simply return the appid to simulate this. bq. Are there plans to support the leveldb store as an alternative to the memory store for the detail timeline store? There was concern that a single dag could overwhelm the server, and storing it to leveldb instead of the memory store would be one way to try to mitigate that. I'm wondering if the class to use for the detail log timeline store should be configurable with MemoryTimelineStore as the default. Could also do this as a followup JIRA if necessary. Will do. bq. Should add entries to yarn-default.xml for the new properties? Yes. Will add them. bq. Do we want to log at the info level that a path is being skipped during the scan? The store can end up scanning fairly often in practice, so this could end up logging a lot for just one path per scan. I'm wondering if making it a debug log is more appropriate. Sure. > Provide new timeline plugin storage to support fine-grained entity caching > -- > > Key: YARN-4265 > URL: https://issues.apache.org/jira/browse/YARN-4265 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Li Lu >Assignee: Li Lu > Attachments: YARN-4265-trunk.poc_001.patch > > > To support the newly proposed APIs in YARN-4234, we need to create a new > plugin timeline store. The store may have similar behavior as the > EntityFileTimelineStore proposed in YARN-3942, but cache date in cache id > granularity, instead of application id granularity. Let's have this storage > as a standalone one, instead of updating EntityFileTimelineStore, to keep the > existing store (EntityFileTimelineStore) stable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3784) Indicate preemption timout along with the list of containers to AM (preemption message)
[ https://issues.apache.org/jira/browse/YARN-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15001145#comment-15001145 ] Hadoop QA commented on YARN-3784: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 10s {color} | {color:blue} docker + precommit patch detected. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 4 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 54s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 32s {color} | {color:green} trunk passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 3s {color} | {color:green} trunk passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 22s {color} | {color:green} trunk passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 37s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common in trunk has 3 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 46s {color} | {color:green} trunk passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 16s {color} | {color:green} trunk passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 23s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 29s {color} | {color:green} the patch passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 5m 29s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 5m 29s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 14s {color} | {color:green} the patch passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 5m 14s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 5m 14s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 12s {color} | {color:red} Patch generated 5 new checkstyle issues in root (total was 392, now 389). {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 22s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 53s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common introduced 1 new FindBugs issues. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 47s {color} | {color:green} the patch passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 21s {color} | {color:green} the patch passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 30s {color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_60. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 26s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.8.0_60. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 30s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_60. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 10m 51s {color} | {color:red} hadoop-mapreduce-client-app in the patch failed with JDK v1.8.0_60. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 4s {color} | {color:green} hadoop-sls in the patch passed with JDK v1.8.0_60. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 28s {color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.7.0_79. {color} | | {color:green}+1{color} | {color:green} unit {color} | {col
[jira] [Commented] (YARN-1509) Make AMRMClient support send increase container request and get increased/decreased containers
[ https://issues.apache.org/jira/browse/YARN-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15001141#comment-15001141 ] Wangda Tan commented on YARN-1509: -- +1 to latest patch, will commit it tomorrow if there's no opposite opinions. > Make AMRMClient support send increase container request and get > increased/decreased containers > -- > > Key: YARN-1509 > URL: https://issues.apache.org/jira/browse/YARN-1509 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan (No longer used) >Assignee: MENG DING > Attachments: YARN-1509.1.patch, YARN-1509.10.patch, > YARN-1509.2.patch, YARN-1509.3.patch, YARN-1509.4.patch, YARN-1509.5.patch, > YARN-1509.6.patch, YARN-1509.7.patch, YARN-1509.8.patch, YARN-1509.9.patch > > > As described in YARN-1197, we need add API in AMRMClient to support > 1) Add increase request > 2) Can get successfully increased/decreased containers from RM -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1509) Make AMRMClient support send increase container request and get increased/decreased containers
[ https://issues.apache.org/jira/browse/YARN-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15001134#comment-15001134 ] Jian He commented on YARN-1509: --- lgtm, thanks Meng ! > Make AMRMClient support send increase container request and get > increased/decreased containers > -- > > Key: YARN-1509 > URL: https://issues.apache.org/jira/browse/YARN-1509 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan (No longer used) >Assignee: MENG DING > Attachments: YARN-1509.1.patch, YARN-1509.10.patch, > YARN-1509.2.patch, YARN-1509.3.patch, YARN-1509.4.patch, YARN-1509.5.patch, > YARN-1509.6.patch, YARN-1509.7.patch, YARN-1509.8.patch, YARN-1509.9.patch > > > As described in YARN-1197, we need add API in AMRMClient to support > 1) Add increase request > 2) Can get successfully increased/decreased containers from RM -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4265) Provide new timeline plugin storage to support fine-grained entity caching
[ https://issues.apache.org/jira/browse/YARN-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15001126#comment-15001126 ] Jason Lowe commented on YARN-4265: -- Thanks for the patch, [~gtCarrera9]! This looks like most of the patch is a copy of the entity timeline store from YARN-3942 with a few edits, so I'm sorta reviewing my own code here. As such I did a diff of the patch from this JIRA and the one from YARN-3942 so I could focus on what's changed. I'll defer to others to review the parts that are identical to YARN-3942. Eventually I can see this being a superset of YARN-3942, since it can cache to memory and either cache everything or a subset based on what the plugins decide. TIMELINE_SERVICE_PLUGIN_ENABLED and DEFAULT_TIMELINE_SERVICE_PLUGIN_ENABLED are not needed. Is there a reason we're not using the Configuration.getInstances method or the ReflectionUtils methods to handle plugin loading? If no plugins are configured (which is the default behavior), do we want a fallback plugin that emulates what YARN-3942 is doing? Are there plans to support the leveldb store as an alternative to the memory store for the detail timeline store? There was concern that a single dag could overwhelm the server, and storing it to leveldb instead of the memory store would be one way to try to mitigate that. I'm wondering if the class to use for the detail log timeline store should be configurable with MemoryTimelineStore as the default. Could also do this as a followup JIRA if necessary. Should add entries to yarn-default.xml for the new properties? Do we want to log at the info level that a path is being skipped during the scan? The store can end up scanning fairly often in practice, so this could end up logging a lot for just one path per scan. I'm wondering if making it a debug log is more appropriate. Comment in getDoneAppPath mentions a cache ID but it's using an app ID. Nit: Indentation is off at the start of the YarnConfiguration patch hunk. > Provide new timeline plugin storage to support fine-grained entity caching > -- > > Key: YARN-4265 > URL: https://issues.apache.org/jira/browse/YARN-4265 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Li Lu >Assignee: Li Lu > Attachments: YARN-4265-trunk.poc_001.patch > > > To support the newly proposed APIs in YARN-4234, we need to create a new > plugin timeline store. The store may have similar behavior as the > EntityFileTimelineStore proposed in YARN-3942, but cache date in cache id > granularity, instead of application id granularity. Let's have this storage > as a standalone one, instead of updating EntityFileTimelineStore, to keep the > existing store (EntityFileTimelineStore) stable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4234) New put APIs in TimelineClient for ats v1.5
[ https://issues.apache.org/jira/browse/YARN-4234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15001108#comment-15001108 ] Li Lu commented on YARN-4234: - Hi [~Naganarasimha], IIUC the TIMELINE_SERVICE_VERSION config is describing the server's version. The client API implementation will perform sanity check to avoid calling a wrong (lower) version of API to the server. This is orthogonal to assuming to a particular version of ATS. The application can handle the logic by itself to decide which correct API to call. Am I missing anything here? Thanks. > New put APIs in TimelineClient for ats v1.5 > --- > > Key: YARN-4234 > URL: https://issues.apache.org/jira/browse/YARN-4234 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-4234.1.patch, YARN-4234.2.patch, > YARN-4234.20151109.patch, YARN-4234.20151110.1.patch, > YARN-4234.2015.1.patch, YARN-4234.3.patch > > > In this ticket, we will add new put APIs in timelineClient to let > clients/applications have the option to use ATS v1.5 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4234) New put APIs in TimelineClient for ats v1.5
[ https://issues.apache.org/jira/browse/YARN-4234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15001093#comment-15001093 ] Naganarasimha G R commented on YARN-4234: - Hi [~xgong] & [~gtCarrera], Before we go ahead with the supporting clients mentioning of ATS version, it would be better to have a look at YARN-3488, which talks about issues of each application having client side configuration of which timeline version to use. Basically it would be better if server side validations happen, if client side then possibilities are more that app will assume to use a particular version of the ATS but the server can be having different timeline server running. > New put APIs in TimelineClient for ats v1.5 > --- > > Key: YARN-4234 > URL: https://issues.apache.org/jira/browse/YARN-4234 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-4234.1.patch, YARN-4234.2.patch, > YARN-4234.20151109.patch, YARN-4234.20151110.1.patch, > YARN-4234.2015.1.patch, YARN-4234.3.patch > > > In this ticket, we will add new put APIs in timelineClient to let > clients/applications have the option to use ATS v1.5 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4132) Nodemanagers should try harder to connect to the RM
[ https://issues.apache.org/jira/browse/YARN-4132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15001064#comment-15001064 ] Hadoop QA commented on YARN-4132: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s {color} | {color:blue} docker + precommit patch detected. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 9s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 49s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 48s {color} | {color:green} trunk passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 27s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 50s {color} | {color:green} trunk passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 15s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common in trunk has 3 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 40s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 1s {color} | {color:green} trunk passed with JDK v1.7.0_85 {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 15s {color} | {color:red} hadoop-yarn-server-common in the patch failed. {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 20s {color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 29s {color} | {color:red} hadoop-yarn in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 29s {color} | {color:red} hadoop-yarn in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 31s {color} | {color:red} hadoop-yarn in the patch failed with JDK v1.7.0_85. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 31s {color} | {color:red} hadoop-yarn in the patch failed with JDK v1.7.0_85. {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 26s {color} | {color:red} Patch generated 1 new checkstyle issues in hadoop-yarn-project/hadoop-yarn (total was 221, now 220). {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 49s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s {color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s {color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 16s {color} | {color:red} hadoop-yarn-server-common in the patch failed. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 18s {color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 42s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 0s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 22s {color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 50s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 14s {color} | {color:red} hadoop-yarn-server-common in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 20s {color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {colo
[jira] [Commented] (YARN-4183) Enabling generic application history forces every job to get a timeline service delegation token
[ https://issues.apache.org/jira/browse/YARN-4183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15001061#comment-15001061 ] Hudson commented on YARN-4183: -- FAILURE: Integrated in Hadoop-Yarn-trunk #1394 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/1394/]) YARN-4183. Reverting the patch to fix behaviour change. Revert (vinodkv: rev 6351d3fa638f1d901279cef9e56dc4e07ef3de11) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java > Enabling generic application history forces every job to get a timeline > service delegation token > > > Key: YARN-4183 > URL: https://issues.apache.org/jira/browse/YARN-4183 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.1 >Reporter: Mit Desai >Assignee: Mit Desai > Attachments: YARN-4183.1.patch > > > When enabling just the Generic History Server and not the timeline server, > the system metrics publisher will not publish the events to the timeline > store as it checks if the timeline server and system metrics publisher are > enabled before creating a timeline client. > To make it work, if the timeline service flag is turned on, it will force > every yarn application to get a delegation token. > Instead of checking if timeline service is enabled, we should be checking if > application history server is enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4183) Enabling generic application history forces every job to get a timeline service delegation token
[ https://issues.apache.org/jira/browse/YARN-4183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15001047#comment-15001047 ] Hudson commented on YARN-4183: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2598 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2598/]) YARN-4183. Reverting the patch to fix behaviour change. Revert (vinodkv: rev 6351d3fa638f1d901279cef9e56dc4e07ef3de11) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java > Enabling generic application history forces every job to get a timeline > service delegation token > > > Key: YARN-4183 > URL: https://issues.apache.org/jira/browse/YARN-4183 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.1 >Reporter: Mit Desai >Assignee: Mit Desai > Attachments: YARN-4183.1.patch > > > When enabling just the Generic History Server and not the timeline server, > the system metrics publisher will not publish the events to the timeline > store as it checks if the timeline server and system metrics publisher are > enabled before creating a timeline client. > To make it work, if the timeline service flag is turned on, it will force > every yarn application to get a delegation token. > Instead of checking if timeline service is enabled, we should be checking if > application history server is enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4183) Enabling generic application history forces every job to get a timeline service delegation token
[ https://issues.apache.org/jira/browse/YARN-4183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15001009#comment-15001009 ] Hudson commented on YARN-4183: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #670 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/670/]) YARN-4183. Reverting the patch to fix behaviour change. Revert (vinodkv: rev 6351d3fa638f1d901279cef9e56dc4e07ef3de11) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java > Enabling generic application history forces every job to get a timeline > service delegation token > > > Key: YARN-4183 > URL: https://issues.apache.org/jira/browse/YARN-4183 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.1 >Reporter: Mit Desai >Assignee: Mit Desai > Attachments: YARN-4183.1.patch > > > When enabling just the Generic History Server and not the timeline server, > the system metrics publisher will not publish the events to the timeline > store as it checks if the timeline server and system metrics publisher are > enabled before creating a timeline client. > To make it work, if the timeline service flag is turned on, it will force > every yarn application to get a delegation token. > Instead of checking if timeline service is enabled, we should be checking if > application history server is enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4345) yarn rmadmin -updateNodeResource doesn't work
[ https://issues.apache.org/jira/browse/YARN-4345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15001006#comment-15001006 ] Hadoop QA commented on YARN-4345: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 5s {color} | {color:blue} docker + precommit patch detected. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 1m 39s {color} | {color:red} root in trunk failed. {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 11s {color} | {color:green} trunk passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 14s {color} | {color:green} trunk passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 8s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 11s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 30s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 12s {color} | {color:green} trunk passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 14s {color} | {color:green} trunk passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 16s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 11s {color} | {color:green} the patch passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 11s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 13s {color} | {color:green} the patch passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 13s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 8s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 10s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 39s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 12s {color} | {color:green} the patch passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 14s {color} | {color:green} the patch passed with JDK v1.7.0_79 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 49m 36s {color} | {color:red} hadoop-yarn-client in the patch failed with JDK v1.8.0_60. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 50m 39s {color} | {color:red} hadoop-yarn-client in the patch failed with JDK v1.7.0_79. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 27s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 106m 58s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_60 Failed junit tests | hadoop.yarn.client.TestGetGroups | | JDK v1.8.0_60 Timed out junit tests | org.apache.hadoop.yarn.client.api.impl.TestAMRMClient | | | org.apache.hadoop.yarn.client.api.impl.TestYarnClient | | | org.apache.hadoop.yarn.client.api.impl.TestNMClient | | JDK v1.7.0_79 Failed junit tests | hadoop.yarn.client.TestGetGroups | | JDK v1.7.0_79 Timed out junit tests | org.apache.hadoop.yarn.client.api.impl.TestAMRMClient | | | org.apache.hadoop.yarn.client.api.impl.TestYarnClient | | | org.apache.hadoop.yarn.client.api.impl.TestNMClient | \\ \\ || Subsystem || Report/Notes || | Docker | Client=1.7.1 Server=1.7.1 Image:test-patch-base-hadoop-date2015-11-11 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12771808/YARN-4345.patch | | JIRA Issue | YARN-4345 | | Optional Tests | asflicense javac javadoc mvninstall unit fin
[jira] [Commented] (YARN-4234) New put APIs in TimelineClient for ats v1.5
[ https://issues.apache.org/jira/browse/YARN-4234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15000962#comment-15000962 ] Xuan Gong commented on YARN-4234: - Attached a new patch with the renamed configurations. bq. CacheId -> TimelineEntityGroupId ? And similarly rename it everywhere. DONE bq. Also move this to org.apache.hadoop.yarn.api.records.timeline. DONE bq. Move CacheIdProto from yarn_protos.proto also accordingly I can not find a better place. Just keep it in yarn_protos.proto. bq. Reorder the API parameters DONE. bq. entity-file-fd.flush-interval-secs -> entity-file-store.fd-flush-interval-secs DONE bq. Similarly entity-file-fd.clean-interval-sec and entity-file-fd.retain-secs DONE bq. Why do we need TIMELINE_SERVICE_PLUGIN_ENABLED especially if we also have this TimelineEntityGroupId/CacheId as part of the writer API? REMOVED bq. TimelineClientImpl is doing two many things. Let's have a DirectTimelineWriter vs HDFSTimelineWriter which can encapsulate functionality. Will do it later. Also, I add a new configuration: TIMELINE_SERVICE_VERSION. So, if we want to use ats 1.5, we need set 1.5 for this configuration. And when we try to use new api for ats1.5, a sanity check would be enforced. > New put APIs in TimelineClient for ats v1.5 > --- > > Key: YARN-4234 > URL: https://issues.apache.org/jira/browse/YARN-4234 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-4234.1.patch, YARN-4234.2.patch, > YARN-4234.20151109.patch, YARN-4234.20151110.1.patch, > YARN-4234.2015.1.patch, YARN-4234.3.patch > > > In this ticket, we will add new put APIs in timelineClient to let > clients/applications have the option to use ATS v1.5 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4234) New put APIs in TimelineClient for ats v1.5
[ https://issues.apache.org/jira/browse/YARN-4234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-4234: Attachment: YARN-4234.2015.1.patch > New put APIs in TimelineClient for ats v1.5 > --- > > Key: YARN-4234 > URL: https://issues.apache.org/jira/browse/YARN-4234 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-4234.1.patch, YARN-4234.2.patch, > YARN-4234.20151109.patch, YARN-4234.20151110.1.patch, > YARN-4234.2015.1.patch, YARN-4234.3.patch > > > In this ticket, we will add new put APIs in timelineClient to let > clients/applications have the option to use ATS v1.5 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-4032) Corrupted state from a previous version can still cause RM to fail with NPE due to same reasons as YARN-2834
[ https://issues.apache.org/jira/browse/YARN-4032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He reassigned YARN-4032: - Assignee: Jian He > Corrupted state from a previous version can still cause RM to fail with NPE > due to same reasons as YARN-2834 > > > Key: YARN-4032 > URL: https://issues.apache.org/jira/browse/YARN-4032 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.1 >Reporter: Anubhav Dhoot >Assignee: Jian He >Priority: Critical > Attachments: YARN-4032.prelim.patch > > > YARN-2834 ensures in 2.6.0 there will not be any inconsistent state. But if > someone is upgrading from a previous version, the state can still be > inconsistent and then RM will still fail with NPE after upgrade to 2.6.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4183) Enabling generic application history forces every job to get a timeline service delegation token
[ https://issues.apache.org/jira/browse/YARN-4183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15000893#comment-15000893 ] Hudson commented on YARN-4183: -- FAILURE: Integrated in Hadoop-trunk-Commit #8793 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8793/]) YARN-4183. Reverting the patch to fix behaviour change. Revert (vinodkv: rev 6351d3fa638f1d901279cef9e56dc4e07ef3de11) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java * hadoop-yarn-project/CHANGES.txt > Enabling generic application history forces every job to get a timeline > service delegation token > > > Key: YARN-4183 > URL: https://issues.apache.org/jira/browse/YARN-4183 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.1 >Reporter: Mit Desai >Assignee: Mit Desai > Attachments: YARN-4183.1.patch > > > When enabling just the Generic History Server and not the timeline server, > the system metrics publisher will not publish the events to the timeline > store as it checks if the timeline server and system metrics publisher are > enabled before creating a timeline client. > To make it work, if the timeline service flag is turned on, it will force > every yarn application to get a delegation token. > Instead of checking if timeline service is enabled, we should be checking if > application history server is enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4346) Test committer.commitJob() behavior during committing when MR AM get failed.
Junping Du created YARN-4346: Summary: Test committer.commitJob() behavior during committing when MR AM get failed. Key: YARN-4346 URL: https://issues.apache.org/jira/browse/YARN-4346 Project: Hadoop YARN Issue Type: Improvement Reporter: Junping Du In MAPREDUCE-5485, we are adding additional API (isCommitJobRepeatable) to allow job commit can tolerate AM failure in some cases (like FileOutputCommitter in v2 algorithm). Although we have unit test to cover most of flows, we may want a completed end to end test to verify the whole work flow. The scenario include: 1. For FileOutputCommitter (or some sub class), emulate a MR AM failure or restart during commitJob() in progress 2. Check different behavior for v1 and v2 (support isCommitJobRepeatable() or not) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4183) Enabling generic application history forces every job to get a timeline service delegation token
[ https://issues.apache.org/jira/browse/YARN-4183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-4183: -- Target Version/s: 2.7.3 (was: 2.7.2) > Enabling generic application history forces every job to get a timeline > service delegation token > > > Key: YARN-4183 > URL: https://issues.apache.org/jira/browse/YARN-4183 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.1 >Reporter: Mit Desai >Assignee: Mit Desai > Attachments: YARN-4183.1.patch > > > When enabling just the Generic History Server and not the timeline server, > the system metrics publisher will not publish the events to the timeline > store as it checks if the timeline server and system metrics publisher are > enabled before creating a timeline client. > To make it work, if the timeline service flag is turned on, it will force > every yarn application to get a delegation token. > Instead of checking if timeline service is enabled, we should be checking if > application history server is enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4183) Enabling generic application history forces every job to get a timeline service delegation token
[ https://issues.apache.org/jira/browse/YARN-4183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-4183: -- Fix Version/s: (was: 2.7.2) (was: 2.8.0) (was: 3.0.0) bq. All of this needs more work, so unless I hear strongly otherwise I am going to revert this patch in the interest of 2.7.2's progress. Seeing no No's, reverted this for the sake of 2.7.2. > Enabling generic application history forces every job to get a timeline > service delegation token > > > Key: YARN-4183 > URL: https://issues.apache.org/jira/browse/YARN-4183 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.1 >Reporter: Mit Desai >Assignee: Mit Desai > Attachments: YARN-4183.1.patch > > > When enabling just the Generic History Server and not the timeline server, > the system metrics publisher will not publish the events to the timeline > store as it checks if the timeline server and system metrics publisher are > enabled before creating a timeline client. > To make it work, if the timeline service flag is turned on, it will force > every yarn application to get a delegation token. > Instead of checking if timeline service is enabled, we should be checking if > application history server is enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4183) Enabling generic application history forces every job to get a timeline service delegation token
[ https://issues.apache.org/jira/browse/YARN-4183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15000857#comment-15000857 ] Li Lu commented on YARN-4183: - Discussed with [~xgong] offline, and seems like YARN-4234 is blocked on the proposed {{yarn.timeline-service.version}} config. If this is the case maybe we can add the config in YARN-4234 and have it reviewed quickly? > Enabling generic application history forces every job to get a timeline > service delegation token > > > Key: YARN-4183 > URL: https://issues.apache.org/jira/browse/YARN-4183 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.1 >Reporter: Mit Desai >Assignee: Mit Desai > Fix For: 3.0.0, 2.8.0, 2.7.2 > > Attachments: YARN-4183.1.patch > > > When enabling just the Generic History Server and not the timeline server, > the system metrics publisher will not publish the events to the timeline > store as it checks if the timeline server and system metrics publisher are > enabled before creating a timeline client. > To make it work, if the timeline service flag is turned on, it will force > every yarn application to get a delegation token. > Instead of checking if timeline service is enabled, we should be checking if > application history server is enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4345) yarn rmadmin -updateNodeResource doesn't work
[ https://issues.apache.org/jira/browse/YARN-4345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-4345: - Attachment: YARN-4345.patch The problem is quite straight-forward. Update a quick patch to fix it. > yarn rmadmin -updateNodeResource doesn't work > - > > Key: YARN-4345 > URL: https://issues.apache.org/jira/browse/YARN-4345 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Sushmitha Sreenivasan >Assignee: Junping Du >Priority: Critical > Attachments: YARN-4345.patch > > > YARN-313 add CLI to update node resource. It works fine for batch mode > update. However, for single node update "yarn rmadmin -updateNodeResource" > failed to work because resource is not set properly in sending request. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4345) yarn rmadmin -updateNodeResource doesn't work
[ https://issues.apache.org/jira/browse/YARN-4345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-4345: - Issue Type: Sub-task (was: Bug) Parent: YARN-291 > yarn rmadmin -updateNodeResource doesn't work > - > > Key: YARN-4345 > URL: https://issues.apache.org/jira/browse/YARN-4345 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Sushmitha Sreenivasan >Assignee: Junping Du >Priority: Critical > > YARN-313 add CLI to update node resource. It works fine for batch mode > update. However, for single node update "yarn rmadmin -updateNodeResource" > failed to work because resource is not set properly in sending request. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4345) yarn rmadmin -updateNodeResource doesn't work
[ https://issues.apache.org/jira/browse/YARN-4345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-4345: - Reporter: Sushmitha Sreenivasan (was: Junping Du) > yarn rmadmin -updateNodeResource doesn't work > - > > Key: YARN-4345 > URL: https://issues.apache.org/jira/browse/YARN-4345 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Sushmitha Sreenivasan >Assignee: Junping Du >Priority: Critical > > YARN-313 add CLI to update node resource. It works fine for batch mode > update. However, for single node update "yarn rmadmin -updateNodeResource" > failed to work because resource is not set properly in sending request. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4345) yarn rmadmin -updateNodeResource doesn't work
[ https://issues.apache.org/jira/browse/YARN-4345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-4345: - Target Version/s: 2.8.0 (was: 2.8.0, 2.7.3) > yarn rmadmin -updateNodeResource doesn't work > - > > Key: YARN-4345 > URL: https://issues.apache.org/jira/browse/YARN-4345 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > > YARN-313 add CLI to update node resource. It works fine for batch mode > update. However, for single node update "yarn rmadmin -updateNodeResource" > failed to work because resource is not set properly in sending request. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Moved] (YARN-4345) yarn rmadmin -updateNodeResource doesn't work
[ https://issues.apache.org/jira/browse/YARN-4345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du moved MAPREDUCE-6544 to YARN-4345: - Affects Version/s: (was: 2.7.1) 2.7.1 Target Version/s: 2.8.0, 2.7.3 (was: 2.7.3) Component/s: (was: resourcemanager) resourcemanager Key: YARN-4345 (was: MAPREDUCE-6544) Project: Hadoop YARN (was: Hadoop Map/Reduce) > yarn rmadmin -updateNodeResource doesn't work > - > > Key: YARN-4345 > URL: https://issues.apache.org/jira/browse/YARN-4345 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > > YARN-313 add CLI to update node resource. It works fine for batch mode > update. However, for single node update "yarn rmadmin -updateNodeResource" > failed to work because resource is not set properly in sending request. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4287) Capacity Scheduler: Rack Locality improvement
[ https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15000770#comment-15000770 ] Wangda Tan commented on YARN-4287: -- Thanks [~nroberts] Patch looks good +1, will commit in a few days if no opposite opinions. > Capacity Scheduler: Rack Locality improvement > - > > Key: YARN-4287 > URL: https://issues.apache.org/jira/browse/YARN-4287 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Affects Versions: 2.7.1 >Reporter: Nathan Roberts >Assignee: Nathan Roberts > Attachments: YARN-4287-minimal-v2.patch, YARN-4287-minimal-v3.patch, > YARN-4287-minimal-v4.patch, YARN-4287-minimal.patch, YARN-4287-v2.patch, > YARN-4287-v3.patch, YARN-4287-v4.patch, YARN-4287.patch > > > YARN-4189 does an excellent job describing the issues with the current delay > scheduling algorithms within the capacity scheduler. The design proposal also > seems like a good direction. > This jira proposes a simple interim solution to the key issue we've been > experiencing on a regular basis: > - rackLocal assignments trickle out due to nodeLocalityDelay. This can have > significant impact on things like CombineFileInputFormat which targets very > specific nodes in its split calculations. > I'm not sure when YARN-4189 will become reality so I thought a simple interim > patch might make sense. The basic idea is simple: > 1) Separate delays for rackLocal, and OffSwitch (today there is only 1) > 2) When we're getting rackLocal assignments, subsequent rackLocal assignments > should not be delayed > Patch will be uploaded shortly. No big deal if the consensus is to go > straight to YARN-4189. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3784) Indicate preemption timout along with the list of containers to AM (preemption message)
[ https://issues.apache.org/jira/browse/YARN-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-3784: -- Attachment: 0004-YARN-3784.patch Rebasing patch as patch has gone stale. [~djp], could you please take a look. > Indicate preemption timout along with the list of containers to AM > (preemption message) > --- > > Key: YARN-3784 > URL: https://issues.apache.org/jira/browse/YARN-3784 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Sunil G >Assignee: Sunil G > Attachments: 0001-YARN-3784.patch, 0002-YARN-3784.patch, > 0003-YARN-3784.patch, 0004-YARN-3784.patch > > > Currently during preemption, AM is notified with a list of containers which > are marked for preemption. Introducing a timeout duration also along with > this container list so that AM can know how much time it will get to do a > graceful shutdown to its containers (assuming one of preemption policy is > loaded in AM). > This will help in decommissioning NM scenarios, where NM will be > decommissioned after a timeout (also killing containers on it). This timeout > will be helpful to indicate AM that those containers can be killed by RM > forcefully after the timeout. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3223) Resource update during NM graceful decommission
[ https://issues.apache.org/jira/browse/YARN-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15000598#comment-15000598 ] Junping Du commented on YARN-3223: -- Thanks [~brookz] for updating the patch! I just finish my review, here is comments: 1. RMNodeImpl.java, {code} + + /** + * Set total resource for the node. + * @param totalResource {@link Resource} New total resource + */ + void setTotalResource(Resource totalResource); {code} We should use RMNodeResourceUpdateEvent to update resource instead of manipulating resource on RMNode directly. Actually, a very similar interface (only name different) was added in YARN-311 but get removed in YARN-312 because RMNode is supposed to be a read-only interface, and all internal state changes are triggered by event instead of directly call. Please refer a similar comments here: (https://issues.apache.org/jira/browse/YARN-312?focusedCommentId=13846087&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13846087) 2. AbstractYarnScheduler.java {code} + /** + * Update the cluster resource based on a particular node's changes. + */ + private synchronized void updateClusterResources(RMNode nm, + SchedulerNode node, Resource oldResource, Resource newResource) { +// Notify NodeLabelsManager about this change +rmContext.getNodeLabelManager().updateNodeResource(nm.getNodeID(), +newResource); + +// Log resource change +LOG.info("Update resource on node: " + node.getNodeName() ++ " from: " + oldResource + ", to: " ++ newResource); + +nodes.remove(nm.getNodeID()); +updateMaximumAllocation(node, false); + +// if node is decommissioning, preserve original total resource +if (nm.getState() == NodeState.DECOMMISSIONING){ + node.setOriginalTotalResource(oldResource); +} +// update resource to node +node.setTotalResource(newResource); + +// Update RMNode +nm.setTotalResource(newResource); + +nodes.put(nm.getNodeID(), (N)node); +updateMaximumAllocation(node, true); + +// update resource to clusterResource +Resources.subtractFrom(clusterResource, oldResource); +Resources.addTo(clusterResource, newResource); + } {code} Instead of set new resource directly on RMNode and SchedulerNode, we should add a new transition (can combine with UpdateNodeResourceWhenRunningTransition) for RMNodeImpl to handle RESOURCE_UPDATE for node in decommissioning stage. It will handle scheduler node resource update and cluster resource update there (also some other logic we could be missing here). For recover the previous capacity after recommission, I would prefer we keep original resource in RMNode side instead of SchedulerNode as the value more belongs to RMNode itself and the state transition from running to decommissioning will take a snapshot of resource at that moment. The original resource can be updated if node resource update trigger from CLI but not when some running container get finished. BTW, it would be nice if we have tests to cover key scenarios below: 1. After a NM with running containers is put into decommissioning stage, it is total capacity can be updated with its used capacity, but original capacity is kept. 2. The total capacity get updated when some running container get finished. 3. After recommissioning, the total capacity of NM can back to original resource. > Resource update during NM graceful decommission > --- > > Key: YARN-3223 > URL: https://issues.apache.org/jira/browse/YARN-3223 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Affects Versions: 2.7.1 >Reporter: Junping Du >Assignee: Brook Zhou > Attachments: YARN-3223-v0.patch, YARN-3223-v1.patch, > YARN-3223-v2.patch > > > During NM graceful decommission, we should handle resource update properly, > include: make RMNode keep track of old resource for possible rollback, keep > available resource to 0 and used resource get updated when > container finished. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4344) NMs reconnecting with changed capabilities can lead to wrong cluster resource calculations
[ https://issues.apache.org/jira/browse/YARN-4344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15000521#comment-15000521 ] Jason Lowe commented on YARN-4344: -- Thanks for the patch, Varun! I think the change will fix the reported issue, but I'm a bit skeptical of the vastly different handling of the event based on whether apps are running or not. For example, if the http port is changing when the node re-registers, why are we treating it as a node removal then addition if there aren't any apps running but not if there are apps running? Seems like that should be consistent. Comments on the patch itself: The comment about sending the node removal event at the start of the main block in the transition is no longer very accurate. Please don't put large sleeps (on the order of seconds) in tests. These extra sleep seconds quickly add up to a significant amount of time over many tests. If we need to sleep for polling reasons the sleep should be much shorter, like on the order of 10ms. Better than sleep-polling is flushing the event dispatcher and then checking since we can avoid polling entirely. Nit: isCapabilityChanged init can be simplified to the following, similar to the noRunningApps boolean init above it: {code} boolean isCapabilityChanged = !rmNode.getTotalCapability().equals(newNode.getTotalCapability()); {code} Nit: is this conditional check even necessary? We can just update the total capability with no semantic effect if it hasn't changed. Since this is just updating a reference with another precomputed one, it's not like we're avoiding some expensive code. ;-) {code} if (isCapabilityChanged) { rmNode.totalCapability = newNode.getTotalCapability(); } {code} > NMs reconnecting with changed capabilities can lead to wrong cluster resource > calculations > -- > > Key: YARN-4344 > URL: https://issues.apache.org/jira/browse/YARN-4344 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.1, 2.6.2 >Reporter: Varun Vasudev >Assignee: Varun Vasudev >Priority: Critical > Attachments: YARN-4344.001.patch > > > After YARN-3802, if an NM re-connects to the RM with changed capabilities, > there can arise situations where the overall cluster resource calculation for > the cluster will be incorrect leading to inconsistencies in scheduling. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4132) Nodemanagers should try harder to connect to the RM
[ https://issues.apache.org/jira/browse/YARN-4132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chang Li updated YARN-4132: --- Attachment: YARN-4132.5.patch Thanks [~jlowe] for review! .5 patch only has one createRMProxy which takes two additional inputs of retry time and retry interval. ServerRMProxy and ClientRMProxy pass those two inputs according to different values in conf. Conf naming is fixed. Test is also tuned down to around 4 seconds. > Nodemanagers should try harder to connect to the RM > --- > > Key: YARN-4132 > URL: https://issues.apache.org/jira/browse/YARN-4132 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Chang Li >Assignee: Chang Li > Attachments: YARN-4132.2.patch, YARN-4132.3.patch, YARN-4132.4.patch, > YARN-4132.5.patch, YARN-4132.patch > > > Being part of the cluster, nodemanagers should try very hard (and possibly > never give up) to connect to a resourcemanager. Minimally we should have a > separate config to set how aggressively a nodemanager will connect to the RM > separate from what clients will do. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4241) Fix typo of property name in yarn-default.xml
[ https://issues.apache.org/jira/browse/YARN-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15000432#comment-15000432 ] Tsuyoshi Ozawa commented on YARN-4241: -- Thanks Anthony for your contribution and thanks Akira for your committing. > Fix typo of property name in yarn-default.xml > - > > Key: YARN-4241 > URL: https://issues.apache.org/jira/browse/YARN-4241 > Project: Hadoop YARN > Issue Type: Bug > Components: documentation >Affects Versions: 2.6.0 >Reporter: Anthony Rojas >Assignee: Anthony Rojas > Labels: newbie > Fix For: 2.8.0, 2.6.3, 2.7.3 > > Attachments: YARN-4241.002.patch, YARN-4241.003.patch, > YARN-4241.branch-2.7.patch, YARN-4241.patch, YARN-4241.patch.1 > > > Typo in description section of yarn-default.xml, under the properties: > yarn.nodemanager.disk-health-checker.min-healthy-disks > yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage > yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb > yarn.nodemanager.disk-health-checker.disk-utilization-watermark-low-per-disk-percentage > The reference to yarn-nodemanager.local-dirs should be > yarn.nodemanager.local-dirs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4304) AM max resource configuration per partition to be displayed/updated correctly in UI and in various partition related metrics
[ https://issues.apache.org/jira/browse/YARN-4304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-4304: -- Attachment: 0001-YARN-4304.patch Uploading an initial version of patch. - For cluster metrics, I still followed the approach of available+allocated to get total resource of queue/cluster. So whenever a resource update happens, I think we need to recalculate the {{availableMB}} (cores) based on all partitions in that queue. This patch is now as per this idea. - In Scheduler page, all AM resource related fix is done. I will attach a screen shot soon. Kindly help to check the patch. > AM max resource configuration per partition to be displayed/updated correctly > in UI and in various partition related metrics > > > Key: YARN-4304 > URL: https://issues.apache.org/jira/browse/YARN-4304 > Project: Hadoop YARN > Issue Type: Sub-task > Components: webapp >Affects Versions: 2.7.1 >Reporter: Sunil G >Assignee: Sunil G > Attachments: 0001-YARN-4304.patch > > > As we are supporting per-partition level max AM resource percentage > configuration, UI and various metrics also need to display correct > configurations related to same. > For eg: Current UI still shows am-resource percentage per queue level. This > is to be updated correctly when label config is used. > - Display max-am-percentage per-partition in Scheduler UI (label also) and in > ClusterMetrics page > - Update queue/partition related metrics w.r.t per-partition > am-resource-percentage -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4344) NMs reconnecting with changed capabilities can lead to wrong cluster resource calculations
[ https://issues.apache.org/jira/browse/YARN-4344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15000361#comment-15000361 ] Hadoop QA commented on YARN-4344: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 6s {color} | {color:blue} docker + precommit patch detected. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 6s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 21s {color} | {color:green} trunk passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s {color} | {color:green} trunk passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 11s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 6s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s {color} | {color:green} trunk passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s {color} | {color:green} trunk passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 27s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 21s {color} | {color:green} the patch passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 21s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s {color} | {color:green} the patch passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 23s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 12s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 17s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s {color} | {color:green} the patch passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s {color} | {color:green} the patch passed with JDK v1.7.0_79 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 59m 49s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_60. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 61m 1s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_79. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 21s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 132m 2s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_60 Failed junit tests | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | | hadoop.yarn.server.resourcemanager.scheduler.TestAbstractYarnScheduler | | | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | JDK v1.7.0_79 Failed junit tests | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | | hadoop.yarn.server.resourcemanager.scheduler.TestAbstractYarnScheduler | | | hadoop.yarn.server.resourcemanager.TestClientRMTokens | \\ \\ || Subsystem || Report/Notes || | Docker | Client=1.7.1 Server=1.7.1 Image:test-patch-base-hadoop-date2015-11-11 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12771722/YARN-4344.001.patch | | JIRA Issue | YARN-4344 | | Optional Tests | asflicense javac javadoc mvninstall unit findbugs checkstyle compile | | uname | Linux a9687b820f5f 3.13.0-36-lowlate
[jira] [Updated] (YARN-4344) NMs reconnecting with changed capabilities can lead to wrong cluster resource calculations
[ https://issues.apache.org/jira/browse/YARN-4344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-4344: Attachment: YARN-4344.001.patch Uploaded a patch with the fix. [~zxu], [~jlowe] - can you please take a look? > NMs reconnecting with changed capabilities can lead to wrong cluster resource > calculations > -- > > Key: YARN-4344 > URL: https://issues.apache.org/jira/browse/YARN-4344 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.1, 2.6.2 >Reporter: Varun Vasudev >Assignee: Varun Vasudev >Priority: Critical > Attachments: YARN-4344.001.patch > > > After YARN-3802, if an NM re-connects to the RM with changed capabilities, > there can arise situations where the overall cluster resource calculation for > the cluster will be incorrect leading to inconsistencies in scheduling. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4241) Fix typo of property name in yarn-default.xml
[ https://issues.apache.org/jira/browse/YARN-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15000238#comment-15000238 ] Hudson commented on YARN-4241: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2534 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2534/]) YARN-4241. Fix typo of property name in yarn-default.xml. Contributed by (aajisaka: rev 23d0db551cc63def9acbab2473e58fb1c52f85e0) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-yarn-project/CHANGES.txt > Fix typo of property name in yarn-default.xml > - > > Key: YARN-4241 > URL: https://issues.apache.org/jira/browse/YARN-4241 > Project: Hadoop YARN > Issue Type: Bug > Components: documentation >Affects Versions: 2.6.0 >Reporter: Anthony Rojas >Assignee: Anthony Rojas > Labels: newbie > Fix For: 2.8.0, 2.6.3, 2.7.3 > > Attachments: YARN-4241.002.patch, YARN-4241.003.patch, > YARN-4241.branch-2.7.patch, YARN-4241.patch, YARN-4241.patch.1 > > > Typo in description section of yarn-default.xml, under the properties: > yarn.nodemanager.disk-health-checker.min-healthy-disks > yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage > yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb > yarn.nodemanager.disk-health-checker.disk-utilization-watermark-low-per-disk-percentage > The reference to yarn-nodemanager.local-dirs should be > yarn.nodemanager.local-dirs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4344) NMs reconnecting with changed capabilities can lead to wrong cluster resource calculations
[ https://issues.apache.org/jira/browse/YARN-4344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15000229#comment-15000229 ] Varun Vasudev commented on YARN-4344: - An example of a situation is shown below - {code} 2015-11-09 10:43:51,784 INFO resourcemanager.ResourceTrackerService (ResourceTrackerService.java:registerNodeManager(345)) - NodeManager from node 10.0.0.64(cmPort: 30050 httpPort: 30060) registered with capability: , assigned nodeId 10.0.0.64:30050 2015-11-09 10:43:51,786 INFO rmnode.RMNodeImpl (RMNodeImpl.java:handle(434)) - 10.0.0.64:30050 Node Transitioned from NEW to RUNNING 2015-11-09 10:43:51,814 INFO capacity.CapacityScheduler (CapacityScheduler.java:addNode(1193)) - Added node 10.0.0.64:30050 clusterResource: 2015-11-09 10:44:37,878 INFO util.RackResolver (RackResolver.java:coreResolve(109)) - Resolved 10.0.0.63 to /default-rack 2015-11-09 10:44:37,879 INFO resourcemanager.ResourceTrackerService (ResourceTrackerService.java:registerNodeManager(345)) - NodeManager from node 10.0.0.63(cmPort: 30050 httpPort: 30060) registered with capability: , assigned nodeId 10.0.0.63:30050 2015-11-09 10:44:37,879 INFO rmnode.RMNodeImpl (RMNodeImpl.java:handle(434)) - 10.0.0.63:30050 Node Transitioned from NEW to RUNNING 2015-11-09 10:44:37,882 INFO capacity.CapacityScheduler (CapacityScheduler.java:addNode(1193)) - Added node 10.0.0.63:30050 clusterResource: 2015-11-09 10:44:39,307 INFO util.RackResolver (RackResolver.java:coreResolve(109)) - Resolved 10.0.0.64 to /default-rack 2015-11-09 10:44:39,309 INFO resourcemanager.ResourceTrackerService (ResourceTrackerService.java:registerNodeManager(313)) - Reconnect from the node at: 10.0.0.64 2015-11-09 10:44:39,312 INFO resourcemanager.ResourceTrackerService (ResourceTrackerService.java:registerNodeManager(345)) - NodeManager from node 10.0.0.64(cmPort: 30050 httpPort: 30060) registered with capability: , assigned nodeId 10.0.0.64:30050 2015-11-09 10:44:39,314 INFO capacity.CapacityScheduler (CapacityScheduler.java:removeNode(1247)) - Removed node 10.0.0.64:30050 clusterResource: 2015-11-09 10:44:39,315 INFO capacity.CapacityScheduler (CapacityScheduler.java:addNode(1193)) - Added node 10.0.0.64:30050 clusterResource: {code} In this case - NM's from 10.0.0.64 and 10.0.0.63 registered leading to a total cluster resource of clusterResource: . After that 10.0.0.64 re-connected with changed capabilities(from to ). This should have led to the cluster resources becoming but instead it is calculated to be . The root cause is this piece of code from RMNodeImpl - {code} rmNode.context.getDispatcher().getEventHandler().handle( new NodeRemovedSchedulerEvent(rmNode)); if (!rmNode.getTotalCapability().equals( newNode.getTotalCapability())) { rmNode.totalCapability = newNode.getTotalCapability(); {code} If the dispatcher is delayed in its processing of the event, by the time the remove node is processed, rmNode.totalCapability = newNode.getTotalCapability() has already been executed and the resources that are removed are the changed capabilities and not the older capabilities of the node. > NMs reconnecting with changed capabilities can lead to wrong cluster resource > calculations > -- > > Key: YARN-4344 > URL: https://issues.apache.org/jira/browse/YARN-4344 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.1, 2.6.2 >Reporter: Varun Vasudev >Assignee: Varun Vasudev >Priority: Critical > > After YARN-3802, if an NM re-connects to the RM with changed capabilities, > there can arise situations where the overall cluster resource calculation for > the cluster will be incorrect leading to inconsistencies in scheduling. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4344) NMs reconnecting with changed capabilities can lead to wrong cluster resource calculations
Varun Vasudev created YARN-4344: --- Summary: NMs reconnecting with changed capabilities can lead to wrong cluster resource calculations Key: YARN-4344 URL: https://issues.apache.org/jira/browse/YARN-4344 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.2, 2.7.1 Reporter: Varun Vasudev Assignee: Varun Vasudev Priority: Critical After YARN-3802, if an NM re-connects to the RM with changed capabilities, there can arise situations where the overall cluster resource calculation for the cluster will be incorrect leading to inconsistencies in scheduling. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4241) Fix typo of property name in yarn-default.xml
[ https://issues.apache.org/jira/browse/YARN-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15000197#comment-15000197 ] Hudson commented on YARN-4241: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #655 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/655/]) YARN-4241. Fix typo of property name in yarn-default.xml. Contributed by (aajisaka: rev 23d0db551cc63def9acbab2473e58fb1c52f85e0) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml > Fix typo of property name in yarn-default.xml > - > > Key: YARN-4241 > URL: https://issues.apache.org/jira/browse/YARN-4241 > Project: Hadoop YARN > Issue Type: Bug > Components: documentation >Affects Versions: 2.6.0 >Reporter: Anthony Rojas >Assignee: Anthony Rojas > Labels: newbie > Fix For: 2.8.0, 2.6.3, 2.7.3 > > Attachments: YARN-4241.002.patch, YARN-4241.003.patch, > YARN-4241.branch-2.7.patch, YARN-4241.patch, YARN-4241.patch.1 > > > Typo in description section of yarn-default.xml, under the properties: > yarn.nodemanager.disk-health-checker.min-healthy-disks > yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage > yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb > yarn.nodemanager.disk-health-checker.disk-utilization-watermark-low-per-disk-percentage > The reference to yarn-nodemanager.local-dirs should be > yarn.nodemanager.local-dirs -- This message was sent by Atlassian JIRA (v6.3.4#6332)