[jira] [Commented] (YARN-9858) Optimize RMContext getExclusiveEnforcedPartitions
[ https://issues.apache.org/jira/browse/YARN-9858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16940655#comment-16940655 ] Bibin Chundatt commented on YARN-9858: -- Thank yoy [~jhung] +1 LGTM for latest patches. I will commit it by EOD if no objections. > Optimize RMContext getExclusiveEnforcedPartitions > -- > > Key: YARN-9858 > URL: https://issues.apache.org/jira/browse/YARN-9858 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jonathan Hung >Assignee: Jonathan Hung >Priority: Major > Labels: release-blocker > Attachments: YARN-9858-branch-2.001.patch, > YARN-9858-branch-3.1.001.patch, YARN-9858-branch-3.2.001.patch, > YARN-9858.001.patch, YARN-9858.002.patch, YARN-9858.003.patch > > > Follow-up from YARN-9730. RMContextImpl#getExclusiveEnforcedPartitions is a > hot code path, need to optimize it . > Since AMS allocate invoked by multiple handlers locking on conf will occur > {code} > java.lang.Thread.State: BLOCKED (on object monitor) > at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2841) > - waiting to lock <0x7f1f8107c748> (a > org.apache.hadoop.yarn.conf.YarnConfiguration) > at org.apache.hadoop.conf.Configuration.get(Configuration.java:1214) > at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1268) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9858) Optimize RMContext getExclusiveEnforcedPartitions
[ https://issues.apache.org/jira/browse/YARN-9858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16940655#comment-16940655 ] Bibin Chundatt edited comment on YARN-9858 at 9/30/19 5:03 AM: --- Thank you [~jhung] +1 LGTM for latest patches. I will commit it by EOD if no objections. was (Author: bibinchundatt): Thank yoy [~jhung] +1 LGTM for latest patches. I will commit it by EOD if no objections. > Optimize RMContext getExclusiveEnforcedPartitions > -- > > Key: YARN-9858 > URL: https://issues.apache.org/jira/browse/YARN-9858 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jonathan Hung >Assignee: Jonathan Hung >Priority: Major > Labels: release-blocker > Attachments: YARN-9858-branch-2.001.patch, > YARN-9858-branch-3.1.001.patch, YARN-9858-branch-3.2.001.patch, > YARN-9858.001.patch, YARN-9858.002.patch, YARN-9858.003.patch > > > Follow-up from YARN-9730. RMContextImpl#getExclusiveEnforcedPartitions is a > hot code path, need to optimize it . > Since AMS allocate invoked by multiple handlers locking on conf will occur > {code} > java.lang.Thread.State: BLOCKED (on object monitor) > at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2841) > - waiting to lock <0x7f1f8107c748> (a > org.apache.hadoop.yarn.conf.YarnConfiguration) > at org.apache.hadoop.conf.Configuration.get(Configuration.java:1214) > at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1268) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9864) Format CS Configuration present in Configuration Store
[ https://issues.apache.org/jira/browse/YARN-9864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16940535#comment-16940535 ] Hadoop QA commented on YARN-9864: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 39s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 4 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 31s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 40s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 36s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 11s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 7s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 84m 28s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 29s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}139m 32s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.1 Server=19.03.1 Image:yetus/hadoop:efed4450bf1 | | JIRA Issue | YARN-9864 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12981728/YARN-9864-002.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 518601e7ca7c 4.15.0-54-generic #58-Ubuntu SMP Mon Jun 24 10:55:24 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / c0edc84 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_222 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/24865/testReport/ | | Max. process+thread count | 831 (vs. ulimit of 5500) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/24865/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > Format CS Configuration present in Configuration
[jira] [Commented] (YARN-9864) Format CS Configuration present in Configuration Store
[ https://issues.apache.org/jira/browse/YARN-9864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16940512#comment-16940512 ] Prabhu Joseph commented on YARN-9864: - *Functionality Test Case:* 1. Have tested the format command with all configuration stores (yarn.scheduler.configuration.store.class=memory / fs / leveldb / zk) curl http://:8088/ws/v1/cluster/scheduler-conf/format > Format CS Configuration present in Configuration Store > -- > > Key: YARN-9864 > URL: https://issues.apache.org/jira/browse/YARN-9864 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Attachments: YARN-9864-001.patch, YARN-9864-002.patch > > > This provides an option to format the configuration changes present in > ConfigurationStore (ZK, LevelDB) and reinitialize from the Local > Capacity-scheduler.xml. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9842) Port YARN-9608 DecommissioningNodesWatcher should get lists of running applications on node from RMNode to branch-3.0/branch-2
[ https://issues.apache.org/jira/browse/YARN-9842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16940510#comment-16940510 ] Hadoop QA commented on YARN-9842: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 17m 12s{color} | {color:red} Docker failed to build yetus/hadoop:20ca6773f1e. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | YARN-9842 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12981724/YARN-9842-branch-3.001.patch | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/24863/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > Port YARN-9608 DecommissioningNodesWatcher should get lists of running > applications on node from RMNode to branch-3.0/branch-2 > -- > > Key: YARN-9842 > URL: https://issues.apache.org/jira/browse/YARN-9842 > Project: Hadoop YARN > Issue Type: Task >Reporter: Abhishek Modi >Assignee: Srinivas S T >Priority: Major > Attachments: YARN-9842-branch-2.001.patch, > YARN-9842-branch-3.001.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9864) Format CS Configuration present in Configuration Store
[ https://issues.apache.org/jira/browse/YARN-9864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated YARN-9864: Attachment: YARN-9864-002.patch > Format CS Configuration present in Configuration Store > -- > > Key: YARN-9864 > URL: https://issues.apache.org/jira/browse/YARN-9864 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Attachments: YARN-9864-001.patch, YARN-9864-002.patch > > > This provides an option to format the configuration changes present in > ConfigurationStore (ZK, LevelDB) and reinitialize from the Local > Capacity-scheduler.xml. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9656) Plugin to avoid scheduling jobs on node which are not in "schedulable" state, but are healthy otherwise.
[ https://issues.apache.org/jira/browse/YARN-9656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16940508#comment-16940508 ] Wangda Tan commented on YARN-9656: -- [~pgolash], how about just define these nodes in unhealthy state? The script to check if a node is healthy or not can be defined by admin. If in production, a node has too high CPU utilization which may cause job issues, to me it is reasonable to define the node is unhealthy. > Plugin to avoid scheduling jobs on node which are not in "schedulable" state, > but are healthy otherwise. > > > Key: YARN-9656 > URL: https://issues.apache.org/jira/browse/YARN-9656 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager, resourcemanager >Affects Versions: 2.9.1, 3.1.2 >Reporter: Prashant Golash >Assignee: Prashant Golash >Priority: Major > Attachments: 2.patch > > > Creating this Jira to get idea from the community if this is something > helpful which can be done in YARN. Some times the nodes go in a bad state for > e.g. (H/W problem: I/O is bad; Fan problem). In some other scenarios, if > CGroup is not enabled, nodes may be running very high on CPU and the jobs > scheduled on them will suffer. > > The idea is three-fold: > # Gather relevant metrics from node-managers and put in some form (for e.g. > exclude file). > # RM loads the files and put the nodes as part of the blacklist. > # Once the node becomes good, they can again be put in the whitelist. > Various optimizations can be done here, but I would like to understand if > this is something which could be helpful as an upstream feature in YARN. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9656) Plugin to avoid scheduling jobs on node which are not in "schedulable" state, but are healthy otherwise.
[ https://issues.apache.org/jira/browse/YARN-9656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16940503#comment-16940503 ] Hadoop QA commented on YARN-9656: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 8s{color} | {color:red} YARN-9656 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | YARN-9656 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12981709/2.patch | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/24864/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > Plugin to avoid scheduling jobs on node which are not in "schedulable" state, > but are healthy otherwise. > > > Key: YARN-9656 > URL: https://issues.apache.org/jira/browse/YARN-9656 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager, resourcemanager >Affects Versions: 2.9.1, 3.1.2 >Reporter: Prashant Golash >Assignee: Prashant Golash >Priority: Major > Attachments: 2.patch > > > Creating this Jira to get idea from the community if this is something > helpful which can be done in YARN. Some times the nodes go in a bad state for > e.g. (H/W problem: I/O is bad; Fan problem). In some other scenarios, if > CGroup is not enabled, nodes may be running very high on CPU and the jobs > scheduled on them will suffer. > > The idea is three-fold: > # Gather relevant metrics from node-managers and put in some form (for e.g. > exclude file). > # RM loads the files and put the nodes as part of the blacklist. > # Once the node becomes good, they can again be put in the whitelist. > Various optimizations can be done here, but I would like to understand if > this is something which could be helpful as an upstream feature in YARN. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9842) Port YARN-9608 DecommissioningNodesWatcher should get lists of running applications on node from RMNode to branch-3.0/branch-2
[ https://issues.apache.org/jira/browse/YARN-9842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16940430#comment-16940430 ] Srinivas S T commented on YARN-9842: Patch for 3.0 [^YARN-9842-branch-3.001.patch] > Port YARN-9608 DecommissioningNodesWatcher should get lists of running > applications on node from RMNode to branch-3.0/branch-2 > -- > > Key: YARN-9842 > URL: https://issues.apache.org/jira/browse/YARN-9842 > Project: Hadoop YARN > Issue Type: Task >Reporter: Abhishek Modi >Assignee: Srinivas S T >Priority: Major > Attachments: YARN-9842-branch-2.001.patch, > YARN-9842-branch-3.001.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9842) Port YARN-9608 DecommissioningNodesWatcher should get lists of running applications on node from RMNode to branch-3.0/branch-2
[ https://issues.apache.org/jira/browse/YARN-9842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Srinivas S T updated YARN-9842: --- Attachment: YARN-9842-branch-3.001.patch > Port YARN-9608 DecommissioningNodesWatcher should get lists of running > applications on node from RMNode to branch-3.0/branch-2 > -- > > Key: YARN-9842 > URL: https://issues.apache.org/jira/browse/YARN-9842 > Project: Hadoop YARN > Issue Type: Task >Reporter: Abhishek Modi >Assignee: Srinivas S T >Priority: Major > Attachments: YARN-9842-branch-2.001.patch, > YARN-9842-branch-3.001.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Reopened] (YARN-2368) ResourceManager failed when ZKRMStateStore tries to update znode data larger than 1MB
[ https://issues.apache.org/jira/browse/YARN-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhuqi reopened YARN-2368: - We can not only set the yarn.resourcemanager.zk-jutemaxbuffer-bytes to be configured, but also can control the if we want to retry or just log some application info to define what application cause the boom of the zk buffer. So that we can make sure the gc problem not happen when we retry too much and time out the zk connection. Also we can find the root application which cause the boom of the zk buffer. > ResourceManager failed when ZKRMStateStore tries to update znode data larger > than 1MB > - > > Key: YARN-2368 > URL: https://issues.apache.org/jira/browse/YARN-2368 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.4.1 >Reporter: Leitao Guo >Assignee: zhuqi >Priority: Critical > Attachments: YARN-2368.patch > > > Both ResouceManagers throw out STATE_STORE_OP_FAILED events and failed > finally. ZooKeeper log shows that ZKRMStateStore tries to update a znode > larger than 1MB, which is the default configuration of ZooKeeper server and > client in 'jute.maxbuffer'. > ResourceManager (ip addr: 10.153.80.8) log shows as the following: > {code} > 2014-07-25 22:33:11,078 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session connected > 2014-07-25 22:33:11,078 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session restored > 2014-07-25 22:33:11,214 FATAL > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a > org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type > STATE_STORE_OP_FAILED. Cause: > org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode > = ConnectionLoss for > /rmstore/ZKRMStateRoot/RMAppRoot/application_1406264354826_1645/appattempt_1406264354826_1645_01 > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:99) > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041) > at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$8.run(ZKRMStateStore.java:926) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$8.run(ZKRMStateStore.java:923) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:973) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:992) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.existsWithRetries(ZKRMStateStore.java:923) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:620) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:675) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:766) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:761) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) > at java.lang.Thread.run(Thread.java:745) > {code} > Meanwhile, ZooKeeps log shows as the following: > {code} > 2014-07-25 22:10:09,728 [myid:1] - INFO > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - > Accepted socket connection from /10.153.80.8:58890 > 2014-07-25 22:10:09,730 [myid:1] - INFO > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@832] - Client > attempting to renew session 0x247684586e70006 at /10.153.80.8:58890 > 2014-07-25 22:10:09,730 [myid:1] - INFO > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:Learner@107] - Revalidating > client: 0x247684586e70006 > 2014-07-25 22:10:09,730 [myid:1] - INFO > [QuorumPeer[myid=1]/0.0.0.0:2181:ZooKeeperServer@595] - Established session > 0x247684586e70006 with negotiated timeout 1 for client /10.153.80.8:58890 > 2014-07-25 22:10:09,730 [myid:1] - INFO > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@863] - got auth > packet /10.153.80.8:58890 > 2014-07-25 22:10:09,730 [myid:1] - INFO > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@897] - auth > success /10.153.80.8:58890 >
[jira] [Assigned] (YARN-2368) ResourceManager failed when ZKRMStateStore tries to update znode data larger than 1MB
[ https://issues.apache.org/jira/browse/YARN-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhuqi reassigned YARN-2368: --- Assignee: zhuqi > ResourceManager failed when ZKRMStateStore tries to update znode data larger > than 1MB > - > > Key: YARN-2368 > URL: https://issues.apache.org/jira/browse/YARN-2368 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.4.1 >Reporter: Leitao Guo >Assignee: zhuqi >Priority: Critical > Attachments: YARN-2368.patch > > > Both ResouceManagers throw out STATE_STORE_OP_FAILED events and failed > finally. ZooKeeper log shows that ZKRMStateStore tries to update a znode > larger than 1MB, which is the default configuration of ZooKeeper server and > client in 'jute.maxbuffer'. > ResourceManager (ip addr: 10.153.80.8) log shows as the following: > {code} > 2014-07-25 22:33:11,078 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session connected > 2014-07-25 22:33:11,078 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session restored > 2014-07-25 22:33:11,214 FATAL > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a > org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type > STATE_STORE_OP_FAILED. Cause: > org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode > = ConnectionLoss for > /rmstore/ZKRMStateRoot/RMAppRoot/application_1406264354826_1645/appattempt_1406264354826_1645_01 > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:99) > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041) > at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$8.run(ZKRMStateStore.java:926) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$8.run(ZKRMStateStore.java:923) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:973) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:992) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.existsWithRetries(ZKRMStateStore.java:923) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:620) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:675) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:766) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:761) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) > at java.lang.Thread.run(Thread.java:745) > {code} > Meanwhile, ZooKeeps log shows as the following: > {code} > 2014-07-25 22:10:09,728 [myid:1] - INFO > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - > Accepted socket connection from /10.153.80.8:58890 > 2014-07-25 22:10:09,730 [myid:1] - INFO > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@832] - Client > attempting to renew session 0x247684586e70006 at /10.153.80.8:58890 > 2014-07-25 22:10:09,730 [myid:1] - INFO > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:Learner@107] - Revalidating > client: 0x247684586e70006 > 2014-07-25 22:10:09,730 [myid:1] - INFO > [QuorumPeer[myid=1]/0.0.0.0:2181:ZooKeeperServer@595] - Established session > 0x247684586e70006 with negotiated timeout 1 for client /10.153.80.8:58890 > 2014-07-25 22:10:09,730 [myid:1] - INFO > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@863] - got auth > packet /10.153.80.8:58890 > 2014-07-25 22:10:09,730 [myid:1] - INFO > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@897] - auth > success /10.153.80.8:58890 > 2014-07-25 22:10:09,742 [myid:1] - WARN > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception > causing close of session 0x247684586e70006 due to java.io.IOException: Len > error 1530 > 747 > 2014-07-25 22:10:09,743 [myid:1] - INFO > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed > socket connection for client /10.153.80.8:58890 which
[jira] [Commented] (YARN-9842) Port YARN-9608 DecommissioningNodesWatcher should get lists of running applications on node from RMNode to branch-3.0/branch-2
[ https://issues.apache.org/jira/browse/YARN-9842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16940265#comment-16940265 ] Srinivas S T commented on YARN-9842: Patch for 2.9 [^YARN-9842-branch-2.001.patch] > Port YARN-9608 DecommissioningNodesWatcher should get lists of running > applications on node from RMNode to branch-3.0/branch-2 > -- > > Key: YARN-9842 > URL: https://issues.apache.org/jira/browse/YARN-9842 > Project: Hadoop YARN > Issue Type: Task >Reporter: Abhishek Modi >Assignee: Srinivas S T >Priority: Major > Attachments: YARN-9842-branch-2.001.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9842) Port YARN-9608 DecommissioningNodesWatcher should get lists of running applications on node from RMNode to branch-3.0/branch-2
[ https://issues.apache.org/jira/browse/YARN-9842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Srinivas S T updated YARN-9842: --- Attachment: YARN-9842-branch-2.001.patch > Port YARN-9608 DecommissioningNodesWatcher should get lists of running > applications on node from RMNode to branch-3.0/branch-2 > -- > > Key: YARN-9842 > URL: https://issues.apache.org/jira/browse/YARN-9842 > Project: Hadoop YARN > Issue Type: Task >Reporter: Abhishek Modi >Assignee: Srinivas S T >Priority: Major > Attachments: YARN-9842-branch-2.001.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org