[jira] [Commented] (YARN-9858) Optimize RMContext getExclusiveEnforcedPartitions

2019-09-29 Thread Bibin Chundatt (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16940655#comment-16940655
 ] 

Bibin Chundatt commented on YARN-9858:
--

Thank yoy [~jhung]

+1 LGTM  for latest patches.  I will commit it by EOD if no objections.

> Optimize RMContext getExclusiveEnforcedPartitions 
> --
>
> Key: YARN-9858
> URL: https://issues.apache.org/jira/browse/YARN-9858
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
>  Labels: release-blocker
> Attachments: YARN-9858-branch-2.001.patch, 
> YARN-9858-branch-3.1.001.patch, YARN-9858-branch-3.2.001.patch, 
> YARN-9858.001.patch, YARN-9858.002.patch, YARN-9858.003.patch
>
>
> Follow-up from YARN-9730. RMContextImpl#getExclusiveEnforcedPartitions is a 
> hot code path, need to optimize it .
> Since AMS allocate invoked by multiple handlers locking on conf will occur
> {code}
> java.lang.Thread.State: BLOCKED (on object monitor)
>  at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2841)
>  - waiting to lock <0x7f1f8107c748> (a 
> org.apache.hadoop.yarn.conf.YarnConfiguration)
>  at org.apache.hadoop.conf.Configuration.get(Configuration.java:1214)
>  at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1268)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9858) Optimize RMContext getExclusiveEnforcedPartitions

2019-09-29 Thread Bibin Chundatt (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16940655#comment-16940655
 ] 

Bibin Chundatt edited comment on YARN-9858 at 9/30/19 5:03 AM:
---

Thank you [~jhung]

+1 LGTM  for latest patches.  I will commit it by EOD if no objections.


was (Author: bibinchundatt):
Thank yoy [~jhung]

+1 LGTM  for latest patches.  I will commit it by EOD if no objections.

> Optimize RMContext getExclusiveEnforcedPartitions 
> --
>
> Key: YARN-9858
> URL: https://issues.apache.org/jira/browse/YARN-9858
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
>  Labels: release-blocker
> Attachments: YARN-9858-branch-2.001.patch, 
> YARN-9858-branch-3.1.001.patch, YARN-9858-branch-3.2.001.patch, 
> YARN-9858.001.patch, YARN-9858.002.patch, YARN-9858.003.patch
>
>
> Follow-up from YARN-9730. RMContextImpl#getExclusiveEnforcedPartitions is a 
> hot code path, need to optimize it .
> Since AMS allocate invoked by multiple handlers locking on conf will occur
> {code}
> java.lang.Thread.State: BLOCKED (on object monitor)
>  at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2841)
>  - waiting to lock <0x7f1f8107c748> (a 
> org.apache.hadoop.yarn.conf.YarnConfiguration)
>  at org.apache.hadoop.conf.Configuration.get(Configuration.java:1214)
>  at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1268)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9864) Format CS Configuration present in Configuration Store

2019-09-29 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16940535#comment-16940535
 ] 

Hadoop QA commented on YARN-9864:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
39s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 36s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
11s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
29s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m  7s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 84m 
28s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
29s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}139m 32s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.1 Server=19.03.1 Image:yetus/hadoop:efed4450bf1 |
| JIRA Issue | YARN-9864 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12981728/YARN-9864-002.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 518601e7ca7c 4.15.0-54-generic #58-Ubuntu SMP Mon Jun 24 
10:55:24 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / c0edc84 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/24865/testReport/ |
| Max. process+thread count | 831 (vs. ulimit of 5500) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/24865/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Format CS Configuration present in Configuration 

[jira] [Commented] (YARN-9864) Format CS Configuration present in Configuration Store

2019-09-29 Thread Prabhu Joseph (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16940512#comment-16940512
 ] 

Prabhu Joseph commented on YARN-9864:
-

*Functionality Test Case:*

1. Have tested the format command with all configuration stores 
(yarn.scheduler.configuration.store.class=memory / fs / leveldb / zk)

curl http://:8088/ws/v1/cluster/scheduler-conf/format

> Format CS Configuration present in Configuration Store
> --
>
> Key: YARN-9864
> URL: https://issues.apache.org/jira/browse/YARN-9864
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9864-001.patch, YARN-9864-002.patch
>
>
> This provides an option to format the configuration changes present in 
> ConfigurationStore (ZK, LevelDB) and reinitialize from the Local 
> Capacity-scheduler.xml.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9842) Port YARN-9608 DecommissioningNodesWatcher should get lists of running applications on node from RMNode to branch-3.0/branch-2

2019-09-29 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16940510#comment-16940510
 ] 

Hadoop QA commented on YARN-9842:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red} 17m 
12s{color} | {color:red} Docker failed to build yetus/hadoop:20ca6773f1e. 
{color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-9842 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12981724/YARN-9842-branch-3.001.patch
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/24863/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Port YARN-9608 DecommissioningNodesWatcher should get lists of running 
> applications on node from RMNode to branch-3.0/branch-2
> --
>
> Key: YARN-9842
> URL: https://issues.apache.org/jira/browse/YARN-9842
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Abhishek Modi
>Assignee: Srinivas S T
>Priority: Major
> Attachments: YARN-9842-branch-2.001.patch, 
> YARN-9842-branch-3.001.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9864) Format CS Configuration present in Configuration Store

2019-09-29 Thread Prabhu Joseph (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-9864:

Attachment: YARN-9864-002.patch

> Format CS Configuration present in Configuration Store
> --
>
> Key: YARN-9864
> URL: https://issues.apache.org/jira/browse/YARN-9864
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9864-001.patch, YARN-9864-002.patch
>
>
> This provides an option to format the configuration changes present in 
> ConfigurationStore (ZK, LevelDB) and reinitialize from the Local 
> Capacity-scheduler.xml.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9656) Plugin to avoid scheduling jobs on node which are not in "schedulable" state, but are healthy otherwise.

2019-09-29 Thread Wangda Tan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16940508#comment-16940508
 ] 

Wangda Tan commented on YARN-9656:
--

[~pgolash], how about just define these nodes in unhealthy state? The script to 
check if a node is healthy or not can be defined by admin. If in production, a 
node has too high CPU utilization which may cause job issues, to me it is 
reasonable to define the node is unhealthy.

> Plugin to avoid scheduling jobs on node which are not in "schedulable" state, 
> but are healthy otherwise.
> 
>
> Key: YARN-9656
> URL: https://issues.apache.org/jira/browse/YARN-9656
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager
>Affects Versions: 2.9.1, 3.1.2
>Reporter: Prashant Golash
>Assignee: Prashant Golash
>Priority: Major
> Attachments: 2.patch
>
>
> Creating this Jira to get idea from the community if this is something 
> helpful which can be done in YARN. Some times the nodes go in a bad state for 
> e.g. (H/W problem: I/O is bad; Fan problem). In some other scenarios, if 
> CGroup is not enabled, nodes may be running very high on CPU and the jobs 
> scheduled on them will suffer.
>  
> The idea is three-fold:
>  # Gather relevant metrics from node-managers and put in some form (for e.g. 
> exclude file).
>  # RM loads the files and put the nodes as part of the blacklist.
>  # Once the node becomes good, they can again be put in the whitelist.
> Various optimizations can be done here, but I would like to understand if 
> this is something which could be helpful as an upstream feature in YARN.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9656) Plugin to avoid scheduling jobs on node which are not in "schedulable" state, but are healthy otherwise.

2019-09-29 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16940503#comment-16940503
 ] 

Hadoop QA commented on YARN-9656:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  8s{color} 
| {color:red} YARN-9656 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-9656 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12981709/2.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/24864/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Plugin to avoid scheduling jobs on node which are not in "schedulable" state, 
> but are healthy otherwise.
> 
>
> Key: YARN-9656
> URL: https://issues.apache.org/jira/browse/YARN-9656
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager
>Affects Versions: 2.9.1, 3.1.2
>Reporter: Prashant Golash
>Assignee: Prashant Golash
>Priority: Major
> Attachments: 2.patch
>
>
> Creating this Jira to get idea from the community if this is something 
> helpful which can be done in YARN. Some times the nodes go in a bad state for 
> e.g. (H/W problem: I/O is bad; Fan problem). In some other scenarios, if 
> CGroup is not enabled, nodes may be running very high on CPU and the jobs 
> scheduled on them will suffer.
>  
> The idea is three-fold:
>  # Gather relevant metrics from node-managers and put in some form (for e.g. 
> exclude file).
>  # RM loads the files and put the nodes as part of the blacklist.
>  # Once the node becomes good, they can again be put in the whitelist.
> Various optimizations can be done here, but I would like to understand if 
> this is something which could be helpful as an upstream feature in YARN.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9842) Port YARN-9608 DecommissioningNodesWatcher should get lists of running applications on node from RMNode to branch-3.0/branch-2

2019-09-29 Thread Srinivas S T (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16940430#comment-16940430
 ] 

Srinivas S T commented on YARN-9842:


Patch for 3.0 [^YARN-9842-branch-3.001.patch] 

> Port YARN-9608 DecommissioningNodesWatcher should get lists of running 
> applications on node from RMNode to branch-3.0/branch-2
> --
>
> Key: YARN-9842
> URL: https://issues.apache.org/jira/browse/YARN-9842
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Abhishek Modi
>Assignee: Srinivas S T
>Priority: Major
> Attachments: YARN-9842-branch-2.001.patch, 
> YARN-9842-branch-3.001.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9842) Port YARN-9608 DecommissioningNodesWatcher should get lists of running applications on node from RMNode to branch-3.0/branch-2

2019-09-29 Thread Srinivas S T (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Srinivas S T updated YARN-9842:
---
Attachment: YARN-9842-branch-3.001.patch

> Port YARN-9608 DecommissioningNodesWatcher should get lists of running 
> applications on node from RMNode to branch-3.0/branch-2
> --
>
> Key: YARN-9842
> URL: https://issues.apache.org/jira/browse/YARN-9842
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Abhishek Modi
>Assignee: Srinivas S T
>Priority: Major
> Attachments: YARN-9842-branch-2.001.patch, 
> YARN-9842-branch-3.001.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Reopened] (YARN-2368) ResourceManager failed when ZKRMStateStore tries to update znode data larger than 1MB

2019-09-29 Thread zhuqi (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhuqi reopened YARN-2368:
-

We can not only set the yarn.resourcemanager.zk-jutemaxbuffer-bytes to be 
configured, but also can control the if we want to retry or just log some 
application info to define what application cause the boom of the zk buffer. So 
that we can make sure the gc problem not happen when we retry too much and time 
out the zk connection. Also we can find the root application which cause the 
boom of the zk buffer.

> ResourceManager failed when ZKRMStateStore tries to update znode data larger 
> than 1MB
> -
>
> Key: YARN-2368
> URL: https://issues.apache.org/jira/browse/YARN-2368
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.4.1
>Reporter: Leitao Guo
>Assignee: zhuqi
>Priority: Critical
> Attachments: YARN-2368.patch
>
>
> Both ResouceManagers throw out STATE_STORE_OP_FAILED events and failed 
> finally. ZooKeeper log shows that ZKRMStateStore tries to update a znode 
> larger than 1MB, which is the default configuration of ZooKeeper server and 
> client in 'jute.maxbuffer'.
> ResourceManager (ip addr: 10.153.80.8) log shows as the following:
> {code}
> 2014-07-25 22:33:11,078 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> ZKRMStateStore Session connected
> 2014-07-25 22:33:11,078 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> ZKRMStateStore Session restored
> 2014-07-25 22:33:11,214 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a 
> org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
> STATE_STORE_OP_FAILED. Cause:
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
> = ConnectionLoss for 
> /rmstore/ZKRMStateRoot/RMAppRoot/application_1406264354826_1645/appattempt_1406264354826_1645_01
> at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
> at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
> at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$8.run(ZKRMStateStore.java:926)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$8.run(ZKRMStateStore.java:923)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:973)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:992)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.existsWithRetries(ZKRMStateStore.java:923)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:620)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:675)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:766)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:761)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> Meanwhile, ZooKeeps log shows as the following:
> {code}
> 2014-07-25 22:10:09,728 [myid:1] - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - 
> Accepted socket connection from /10.153.80.8:58890
> 2014-07-25 22:10:09,730 [myid:1] - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@832] - Client 
> attempting to renew session 0x247684586e70006 at /10.153.80.8:58890
> 2014-07-25 22:10:09,730 [myid:1] - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:Learner@107] - Revalidating 
> client: 0x247684586e70006
> 2014-07-25 22:10:09,730 [myid:1] - INFO  
> [QuorumPeer[myid=1]/0.0.0.0:2181:ZooKeeperServer@595] - Established session 
> 0x247684586e70006 with negotiated timeout 1 for client /10.153.80.8:58890
> 2014-07-25 22:10:09,730 [myid:1] - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@863] - got auth 
> packet /10.153.80.8:58890
> 2014-07-25 22:10:09,730 [myid:1] - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@897] - auth 
> success /10.153.80.8:58890
> 

[jira] [Assigned] (YARN-2368) ResourceManager failed when ZKRMStateStore tries to update znode data larger than 1MB

2019-09-29 Thread zhuqi (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhuqi reassigned YARN-2368:
---

Assignee: zhuqi

> ResourceManager failed when ZKRMStateStore tries to update znode data larger 
> than 1MB
> -
>
> Key: YARN-2368
> URL: https://issues.apache.org/jira/browse/YARN-2368
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.4.1
>Reporter: Leitao Guo
>Assignee: zhuqi
>Priority: Critical
> Attachments: YARN-2368.patch
>
>
> Both ResouceManagers throw out STATE_STORE_OP_FAILED events and failed 
> finally. ZooKeeper log shows that ZKRMStateStore tries to update a znode 
> larger than 1MB, which is the default configuration of ZooKeeper server and 
> client in 'jute.maxbuffer'.
> ResourceManager (ip addr: 10.153.80.8) log shows as the following:
> {code}
> 2014-07-25 22:33:11,078 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> ZKRMStateStore Session connected
> 2014-07-25 22:33:11,078 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> ZKRMStateStore Session restored
> 2014-07-25 22:33:11,214 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a 
> org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
> STATE_STORE_OP_FAILED. Cause:
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
> = ConnectionLoss for 
> /rmstore/ZKRMStateRoot/RMAppRoot/application_1406264354826_1645/appattempt_1406264354826_1645_01
> at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
> at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
> at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$8.run(ZKRMStateStore.java:926)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$8.run(ZKRMStateStore.java:923)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:973)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:992)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.existsWithRetries(ZKRMStateStore.java:923)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:620)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:675)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:766)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:761)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> Meanwhile, ZooKeeps log shows as the following:
> {code}
> 2014-07-25 22:10:09,728 [myid:1] - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - 
> Accepted socket connection from /10.153.80.8:58890
> 2014-07-25 22:10:09,730 [myid:1] - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@832] - Client 
> attempting to renew session 0x247684586e70006 at /10.153.80.8:58890
> 2014-07-25 22:10:09,730 [myid:1] - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:Learner@107] - Revalidating 
> client: 0x247684586e70006
> 2014-07-25 22:10:09,730 [myid:1] - INFO  
> [QuorumPeer[myid=1]/0.0.0.0:2181:ZooKeeperServer@595] - Established session 
> 0x247684586e70006 with negotiated timeout 1 for client /10.153.80.8:58890
> 2014-07-25 22:10:09,730 [myid:1] - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@863] - got auth 
> packet /10.153.80.8:58890
> 2014-07-25 22:10:09,730 [myid:1] - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@897] - auth 
> success /10.153.80.8:58890
> 2014-07-25 22:10:09,742 [myid:1] - WARN  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception 
> causing close of session 0x247684586e70006 due to java.io.IOException: Len 
> error 1530
> 747
> 2014-07-25 22:10:09,743 [myid:1] - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed 
> socket connection for client /10.153.80.8:58890 which 

[jira] [Commented] (YARN-9842) Port YARN-9608 DecommissioningNodesWatcher should get lists of running applications on node from RMNode to branch-3.0/branch-2

2019-09-29 Thread Srinivas S T (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16940265#comment-16940265
 ] 

Srinivas S T commented on YARN-9842:


Patch for 2.9 [^YARN-9842-branch-2.001.patch] 

> Port YARN-9608 DecommissioningNodesWatcher should get lists of running 
> applications on node from RMNode to branch-3.0/branch-2
> --
>
> Key: YARN-9842
> URL: https://issues.apache.org/jira/browse/YARN-9842
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Abhishek Modi
>Assignee: Srinivas S T
>Priority: Major
> Attachments: YARN-9842-branch-2.001.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9842) Port YARN-9608 DecommissioningNodesWatcher should get lists of running applications on node from RMNode to branch-3.0/branch-2

2019-09-29 Thread Srinivas S T (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Srinivas S T updated YARN-9842:
---
Attachment: YARN-9842-branch-2.001.patch

> Port YARN-9608 DecommissioningNodesWatcher should get lists of running 
> applications on node from RMNode to branch-3.0/branch-2
> --
>
> Key: YARN-9842
> URL: https://issues.apache.org/jira/browse/YARN-9842
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Abhishek Modi
>Assignee: Srinivas S T
>Priority: Major
> Attachments: YARN-9842-branch-2.001.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org