[jira] [Commented] (YARN-2268) Disallow formatting the RMStateStore when there is an RM running
[ https://issues.apache.org/jira/browse/YARN-2268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14551826#comment-14551826 ] Rohith commented on YARN-2268: -- Thanks [~sunilg] [~jianhe] [~kasha] for sharing your thoughts.. bq. Given we recommend using the ZK-store when using HA, how about adding this for the ZK-store using an ephemeral znode for lock first? +1 given state store recommend for ZKRMStateStore. bq. How about creating a lock file and declaring it stale after a stipulated period of time. If we use stipulated period, am thinking that within the stiplated period, neither RM cant be started nor state store format cant be done. And the file has to be stored in hdfs neverthless of RMStateStore which is extra binding with filesytem. I am thinking , why can't we use general approach of polling the web service, it will give more accurate state. ? > Disallow formatting the RMStateStore when there is an RM running > > > Key: YARN-2268 > URL: https://issues.apache.org/jira/browse/YARN-2268 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Karthik Kambatla >Assignee: Rohith > Attachments: 0001-YARN-2268.patch > > > YARN-2131 adds a way to format the RMStateStore. However, it can be a problem > if we format the store while an RM is actively using it. It would be nice to > fail the format if there is an RM running and using this store. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2268) Disallow formatting the RMStateStore when there is an RM running
[ https://issues.apache.org/jira/browse/YARN-2268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14551745#comment-14551745 ] Karthik Kambatla commented on YARN-2268: Given we recommend using the ZK-store when using HA, how about adding this for the ZK-store using an ephemeral znode for lock first? We could think of alternate ways for other stores. How about creating a lock file and declaring it stale after a stipulated period of time. It is a hacky approach, but might suffice? > Disallow formatting the RMStateStore when there is an RM running > > > Key: YARN-2268 > URL: https://issues.apache.org/jira/browse/YARN-2268 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Karthik Kambatla >Assignee: Rohith > Attachments: 0001-YARN-2268.patch > > > YARN-2131 adds a way to format the RMStateStore. However, it can be a problem > if we format the store while an RM is actively using it. It would be nice to > fail the format if there is an RM running and using this store. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2268) Disallow formatting the RMStateStore when there is an RM running
[ https://issues.apache.org/jira/browse/YARN-2268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14546296#comment-14546296 ] Jian He commented on YARN-2268: --- I think the lock file solution only suits zk-store, not for other state-store implementations. The current approach of polling web service should be more general. > Disallow formatting the RMStateStore when there is an RM running > > > Key: YARN-2268 > URL: https://issues.apache.org/jira/browse/YARN-2268 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Karthik Kambatla >Assignee: Rohith > Attachments: 0001-YARN-2268.patch > > > YARN-2131 adds a way to format the RMStateStore. However, it can be a problem > if we format the store while an RM is actively using it. It would be nice to > fail the format if there is an RM running and using this store. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2268) Disallow formatting the RMStateStore when there is an RM running
[ https://issues.apache.org/jira/browse/YARN-2268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14533311#comment-14533311 ] Xuan Gong commented on YARN-2268: - Cancel the patch, looks like we need more discussion > Disallow formatting the RMStateStore when there is an RM running > > > Key: YARN-2268 > URL: https://issues.apache.org/jira/browse/YARN-2268 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Karthik Kambatla >Assignee: Rohith > Labels: BB2015-05-TBR > Attachments: 0001-YARN-2268.patch > > > YARN-2131 adds a way to format the RMStateStore. However, it can be a problem > if we format the store while an RM is actively using it. It would be nice to > fail the format if there is an RM running and using this store. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2268) Disallow formatting the RMStateStore when there is an RM running
[ https://issues.apache.org/jira/browse/YARN-2268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14506574#comment-14506574 ] Sunil G commented on YARN-2268: --- HI [~rohithsharma] I feel we can keep the file in statestore itself. And take file lock from there, then as you mentioned we may get exposed to race conditions such as RM is killed etc and lock file remains. Due to which *format* cannot be performed at all. Could we have a *-soft* and *-hard* format options here. *-soft* can acquire lock file to perform option. *-hard* format can go in and format w/o caring the lock. Pls share your thoughts. > Disallow formatting the RMStateStore when there is an RM running > > > Key: YARN-2268 > URL: https://issues.apache.org/jira/browse/YARN-2268 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Karthik Kambatla >Assignee: Rohith > Attachments: 0001-YARN-2268.patch > > > YARN-2131 adds a way to format the RMStateStore. However, it can be a problem > if we format the store while an RM is actively using it. It would be nice to > fail the format if there is an RM running and using this store. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2268) Disallow formatting the RMStateStore when there is an RM running
[ https://issues.apache.org/jira/browse/YARN-2268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14506544#comment-14506544 ] Rohith commented on YARN-2268: -- Thanks [~vinodkv] and [~xgong] pinching in. Trying to understand the approach, and some doubts # Where does the lock file has been created? Is it in StateStore or HDFS or local file system? If statestore/HDFS, there would be many error handling scenario's would occure like if RM is killed then lock file remains in the statestore. This makes no other RM commands to execute. Creating file in HDFS require additional dependencies on HDFS in RM. > Disallow formatting the RMStateStore when there is an RM running > > > Key: YARN-2268 > URL: https://issues.apache.org/jira/browse/YARN-2268 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Karthik Kambatla >Assignee: Rohith > Attachments: 0001-YARN-2268.patch > > > YARN-2131 adds a way to format the RMStateStore. However, it can be a problem > if we format the store while an RM is actively using it. It would be nice to > fail the format if there is an RM running and using this store. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2268) Disallow formatting the RMStateStore when there is an RM running
[ https://issues.apache.org/jira/browse/YARN-2268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14505322#comment-14505322 ] Xuan Gong commented on YARN-2268: - bq. If an active RM creates a "I am using the state-store" lock-file, then the command can bail out. Similarly, the command can create a "I am blowing up the state-store while you were presumably away", so that RM can crash deterministically when a format is in progress. +1 for the proposal. This is probably the simplest way to fix the issue. > Disallow formatting the RMStateStore when there is an RM running > > > Key: YARN-2268 > URL: https://issues.apache.org/jira/browse/YARN-2268 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Karthik Kambatla >Assignee: Rohith > Attachments: 0001-YARN-2268.patch > > > YARN-2131 adds a way to format the RMStateStore. However, it can be a problem > if we format the store while an RM is actively using it. It would be nice to > fail the format if there is an RM running and using this store. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2268) Disallow formatting the RMStateStore when there is an RM running
[ https://issues.apache.org/jira/browse/YARN-2268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14505296#comment-14505296 ] Vinod Kumar Vavilapalli commented on YARN-2268: --- Further, this should also take care of YARN-3410, whichever patch goes in first. > Disallow formatting the RMStateStore when there is an RM running > > > Key: YARN-2268 > URL: https://issues.apache.org/jira/browse/YARN-2268 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Karthik Kambatla >Assignee: Rohith > Attachments: 0001-YARN-2268.patch > > > YARN-2131 adds a way to format the RMStateStore. However, it can be a problem > if we format the store while an RM is actively using it. It would be nice to > fail the format if there is an RM running and using this store. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2268) Disallow formatting the RMStateStore when there is an RM running
[ https://issues.apache.org/jira/browse/YARN-2268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14505293#comment-14505293 ] Vinod Kumar Vavilapalli commented on YARN-2268: --- At YARN-2131, I mentioned a lock file in passing. If an active RM creates a "I am using the state-store" lock-file, then the command can bail out. Similarly, the command can create a "I am blowing up the state-store while you were presumably away", so that RM can crash deterministically when a format is in progress. > Disallow formatting the RMStateStore when there is an RM running > > > Key: YARN-2268 > URL: https://issues.apache.org/jira/browse/YARN-2268 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Karthik Kambatla >Assignee: Rohith > Attachments: 0001-YARN-2268.patch > > > YARN-2131 adds a way to format the RMStateStore. However, it can be a problem > if we format the store while an RM is actively using it. It would be nice to > fail the format if there is an RM running and using this store. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2268) Disallow formatting the RMStateStore when there is an RM running
[ https://issues.apache.org/jira/browse/YARN-2268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14499750#comment-14499750 ] Hadoop QA commented on YARN-2268: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12726128/0001-YARN-2268.patch against trunk revision 76e7264. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 1 warning messages. See https://builds.apache.org/job/PreCommit-YARN-Build/7375//artifact/patchprocess/diffJavadocWarnings.txt for details. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 2 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestRM Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7375//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/7375//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7375//console This message is automatically generated. > Disallow formatting the RMStateStore when there is an RM running > > > Key: YARN-2268 > URL: https://issues.apache.org/jira/browse/YARN-2268 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Karthik Kambatla >Assignee: Rohith > Attachments: 0001-YARN-2268.patch > > > YARN-2131 adds a way to format the RMStateStore. However, it can be a problem > if we format the store while an RM is actively using it. It would be nice to > fail the format if there is an RM running and using this store. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2268) Disallow formatting the RMStateStore when there is an RM running
[ https://issues.apache.org/jira/browse/YARN-2268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14499641#comment-14499641 ] Rohith commented on YARN-2268: -- I verified the patch deploying in HA cluster and Non-HA cluster. On any active RM is found in the cluster then exeption will be thrown back to console > Disallow formatting the RMStateStore when there is an RM running > > > Key: YARN-2268 > URL: https://issues.apache.org/jira/browse/YARN-2268 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Karthik Kambatla >Assignee: Rohith > Attachments: 0001-YARN-2268.patch > > > YARN-2131 adds a way to format the RMStateStore. However, it can be a problem > if we format the store while an RM is actively using it. It would be nice to > fail the format if there is an RM running and using this store. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2268) Disallow formatting the RMStateStore when there is an RM running
[ https://issues.apache.org/jira/browse/YARN-2268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14499638#comment-14499638 ] Rohith commented on YARN-2268: -- Attached the patch for disallowing format store using previous approach. Kindly review the patch > Disallow formatting the RMStateStore when there is an RM running > > > Key: YARN-2268 > URL: https://issues.apache.org/jira/browse/YARN-2268 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Karthik Kambatla >Assignee: Rohith > Attachments: 0001-YARN-2268.patch > > > YARN-2131 adds a way to format the RMStateStore. However, it can be a problem > if we format the store while an RM is actively using it. It would be nice to > fail the format if there is an RM running and using this store. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2268) Disallow formatting the RMStateStore when there is an RM running
[ https://issues.apache.org/jira/browse/YARN-2268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496103#comment-14496103 ] Rohith commented on YARN-2268: -- I propose the following way to handle disallow state store when RM is running. For both HA(Active and Standby) and Non-HA, it is possible to get RM state using REST API getClusterInfo('ws/v1/cluster/info'). This can be make use for identifying RM state. This is independent of any state store implementaions. In HA, ACTIVE state is checked with all the the RM-Id's in a sequential manner. If no ACTIVE state RM is found then format the store otherwise throw an exception *ActiveResourceManagerRunningException*. Cons : Formatting state store when HA is enabled is *Best Effort* basis, there would be scenario where RM state can be chagned after one of the RM is checked. Kindly share your thoughts on this approach.. > Disallow formatting the RMStateStore when there is an RM running > > > Key: YARN-2268 > URL: https://issues.apache.org/jira/browse/YARN-2268 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Karthik Kambatla >Assignee: Rohith > > YARN-2131 adds a way to format the RMStateStore. However, it can be a problem > if we format the store while an RM is actively using it. It would be nice to > fail the format if there is an RM running and using this store. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2268) Disallow formatting the RMStateStore when there is an RM running
[ https://issues.apache.org/jira/browse/YARN-2268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14391166#comment-14391166 ] Karthik Kambatla commented on YARN-2268: IIRR, the store formatting is actually executed by the RM, even if it is invoked from any node. > Disallow formatting the RMStateStore when there is an RM running > > > Key: YARN-2268 > URL: https://issues.apache.org/jira/browse/YARN-2268 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Karthik Kambatla >Assignee: Rohith > > YARN-2131 adds a way to format the RMStateStore. However, it can be a problem > if we format the store while an RM is actively using it. It would be nice to > fail the format if there is an RM running and using this store. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2268) Disallow formatting the RMStateStore when there is an RM running
[ https://issues.apache.org/jira/browse/YARN-2268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14390122#comment-14390122 ] Rohith commented on YARN-2268: -- Thinking on this jira, getting many questions. # How to identify RM is running since RM can be formated from anywhere in the cluster? # In HA, for each rm-ids to be checked for serviceState. This would result in time consuming for each hosts retry would take time. If switch happens in the middle while checking rm-ids, it would give wrong result that all RM's are in standby. I think if admin support is there, 1st can be solved easily. > Disallow formatting the RMStateStore when there is an RM running > > > Key: YARN-2268 > URL: https://issues.apache.org/jira/browse/YARN-2268 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Karthik Kambatla >Assignee: Rohith > > YARN-2131 adds a way to format the RMStateStore. However, it can be a problem > if we format the store while an RM is actively using it. It would be nice to > fail the format if there is an RM running and using this store. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2268) Disallow formatting the RMStateStore when there is an RM running
[ https://issues.apache.org/jira/browse/YARN-2268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14383396#comment-14383396 ] Xu Chen commented on YARN-2268: --- +1;when I using YARN-2131 and RM running and using this store. RM si crash by this log: 2015-03-27 12:14:07,496 ERROR org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error removing app: application_1426659183298_1684 java.lang.Exception: Failed to delete /rmstore/FSRMStateRoot/RMAppRoot/application_1426659183298_1684 at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.deleteFile(FileSystemRMStateStore.java:497) at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.removeApplicationStateInternal(FileSystemRMStateStore.java:403) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:693) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:770) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:765) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:745) 2015-03-27 12:14:07,499 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type STATE_STORE_OP_FAILED. Cause: java.lang.Exception: Failed to delete /rmstore/FSRMStateRoot/RMAppRoot/application_1426659183298_1684 at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.deleteFile(FileSystemRMStateStore.java:497) at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.removeApplicationStateInternal(FileSystemRMStateStore.java:403) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:693) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:770) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:765) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:745) and I fix this through simple way : do not throws the exception when remove operation > Disallow formatting the RMStateStore when there is an RM running > > > Key: YARN-2268 > URL: https://issues.apache.org/jira/browse/YARN-2268 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Karthik Kambatla >Assignee: Rohith > > YARN-2131 adds a way to format the RMStateStore. However, it can be a problem > if we format the store while an RM is actively using it. It would be nice to > fail the format if there is an RM running and using this store. -- This message was sent by Atlassian JIRA (v6.3.4#6332)