[jira] [Commented] (YARN-5694) ZKRMStateStore should always start its verification thread to prevent accidental state store corruption

2016-10-25 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15606155#comment-15606155
 ] 

Karthik Kambatla commented on YARN-5694:


I have minor comments (nits) on the trunk patch:
# ResourceManager: Since the method is now not just transitioning to standby, 
it should likely be named differently. How about {{handleFatalFailure()}}?
# ZKRMStateStore: In closeInternal, I would still prefer the null check on 
verifyActiveStatusThread. 

On the branch-2.7 patch, does it also include the YARN-5677 patch? 

> ZKRMStateStore should always start its verification thread to prevent 
> accidental state store corruption
> ---
>
> Key: YARN-5694
> URL: https://issues.apache.org/jira/browse/YARN-5694
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.0.0-alpha1
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Critical
> Attachments: YARN-5694.001.patch, YARN-5694.002.patch, 
> YARN-5694.003.patch, YARN-5694.004.patch, YARN-5694.004.patch, 
> YARN-5694.005.patch, YARN-5694.branch-2.7.001.patch, 
> YARN-5694.branch-2.7.002.patch
>
>
> There are two cases.  In branch-2.7, the 
> {{ZKRMStateStore.VerifyActiveStatusThread}} is always started, even when 
> using embedded or Curator failover.  In branch-2.8, the 
> {{ZKRMStateStore.VerifyActiveStatusThread}} is only started when HA is 
> disabled, which makes no sense.  Based on the JIRA that introduced that 
> change (YARN-4559), I believe the intent was to start it only when embedded 
> failover is disabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5694) ZKRMStateStore should always start its verification thread to prevent accidental state store corruption

2016-10-25 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15606382#comment-15606382
 ] 

Daniel Templeton commented on YARN-5694:


The branch-2.7 patch is out of date.  I stopped updating it until we settle on 
the trunk patch.

> ZKRMStateStore should always start its verification thread to prevent 
> accidental state store corruption
> ---
>
> Key: YARN-5694
> URL: https://issues.apache.org/jira/browse/YARN-5694
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.0.0-alpha1
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Critical
> Attachments: YARN-5694.001.patch, YARN-5694.002.patch, 
> YARN-5694.003.patch, YARN-5694.004.patch, YARN-5694.004.patch, 
> YARN-5694.005.patch, YARN-5694.006.patch, YARN-5694.branch-2.7.001.patch, 
> YARN-5694.branch-2.7.002.patch
>
>
> There are two cases.  In branch-2.7, the 
> {{ZKRMStateStore.VerifyActiveStatusThread}} is always started, even when 
> using embedded or Curator failover.  In branch-2.8, the 
> {{ZKRMStateStore.VerifyActiveStatusThread}} is only started when HA is 
> disabled, which makes no sense.  Based on the JIRA that introduced that 
> change (YARN-4559), I believe the intent was to start it only when embedded 
> failover is disabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5694) ZKRMStateStore should always start its verification thread to prevent accidental state store corruption

2016-10-31 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15623378#comment-15623378
 ] 

Karthik Kambatla commented on YARN-5694:


I am not very particular about the method name, but do think the current name 
does not fully capture the intention. 

+1, pending Jenkins. Just manually kick started it. 

> ZKRMStateStore should always start its verification thread to prevent 
> accidental state store corruption
> ---
>
> Key: YARN-5694
> URL: https://issues.apache.org/jira/browse/YARN-5694
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.0.0-alpha1
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Critical
>  Labels: oct16-medium
> Attachments: YARN-5694.001.patch, YARN-5694.002.patch, 
> YARN-5694.003.patch, YARN-5694.004.patch, YARN-5694.004.patch, 
> YARN-5694.005.patch, YARN-5694.006.patch, YARN-5694.branch-2.7.001.patch, 
> YARN-5694.branch-2.7.002.patch
>
>
> There are two cases.  In branch-2.7, the 
> {{ZKRMStateStore.VerifyActiveStatusThread}} is always started, even when 
> using embedded or Curator failover.  In branch-2.8, the 
> {{ZKRMStateStore.VerifyActiveStatusThread}} is only started when HA is 
> disabled, which makes no sense.  Based on the JIRA that introduced that 
> change (YARN-4559), I believe the intent was to start it only when embedded 
> failover is disabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5694) ZKRMStateStore should always start its verification thread to prevent accidental state store corruption

2016-10-31 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15623488#comment-15623488
 ] 

Hadoop QA commented on YARN-5694:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
19s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  2m 
48s{color} | {color:red} root in trunk failed. {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 19s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 1 new + 115 unchanged - 3 fixed = 116 total (was 118) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
19s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 38m 39s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
15s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 50m 25s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Issue | YARN-5694 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12835196/YARN-5694.006.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux f83c5f3087ac 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 
17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 2528bea |
| Default Java | 1.8.0_101 |
| mvninstall | 
https://builds.apache.org/job/PreCommit-YARN-Build/13706/artifact/patchprocess/branch-mvninstall-root.txt
 |
| findbugs | v3.0.0 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/13706/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/13706/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/13706/testReport/ |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 
hadoop-yar

[jira] [Commented] (YARN-5694) ZKRMStateStore should always start its verification thread to prevent accidental state store corruption

2016-11-01 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15626103#comment-15626103
 ] 

Karthik Kambatla commented on YARN-5694:


+1, pending Jenkins. 

> ZKRMStateStore should always start its verification thread to prevent 
> accidental state store corruption
> ---
>
> Key: YARN-5694
> URL: https://issues.apache.org/jira/browse/YARN-5694
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.0.0-alpha1
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Critical
>  Labels: oct16-medium
> Attachments: YARN-5694.001.patch, YARN-5694.002.patch, 
> YARN-5694.003.patch, YARN-5694.004.patch, YARN-5694.004.patch, 
> YARN-5694.005.patch, YARN-5694.006.patch, YARN-5694.007.patch, 
> YARN-5694.branch-2.7.001.patch, YARN-5694.branch-2.7.002.patch
>
>
> There are two cases.  In branch-2.7, the 
> {{ZKRMStateStore.VerifyActiveStatusThread}} is always started, even when 
> using embedded or Curator failover.  In branch-2.8, the 
> {{ZKRMStateStore.VerifyActiveStatusThread}} is only started when HA is 
> disabled, which makes no sense.  Based on the JIRA that introduced that 
> change (YARN-4559), I believe the intent was to start it only when embedded 
> failover is disabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5694) ZKRMStateStore should always start its verification thread to prevent accidental state store corruption

2016-11-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15626411#comment-15626411
 ] 

Hadoop QA commented on YARN-5694:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
14s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  9m 
 2s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 23s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 1 new + 115 unchanged - 3 fixed = 116 total (was 118) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
19s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 36m 11s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
16s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 55m 33s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart |
|   | hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Issue | YARN-5694 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12836393/YARN-5694.007.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 1af4e42b6702 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 34173a4 |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/13738/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/13738/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/13738/testReport/ |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-serv

[jira] [Commented] (YARN-5694) ZKRMStateStore should always start its verification thread to prevent accidental state store corruption

2016-11-02 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15629027#comment-15629027
 ] 

Daniel Templeton commented on YARN-5694:


One of the unit test failures is YARN-5043.  The other looks a little like 
YARN-5057, except that YARN-5057 was already fixed.  I was able to reproduce 
the test failure without my patch applied, so I filed YARN-5816 for it.

> ZKRMStateStore should always start its verification thread to prevent 
> accidental state store corruption
> ---
>
> Key: YARN-5694
> URL: https://issues.apache.org/jira/browse/YARN-5694
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.0.0-alpha1
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Critical
>  Labels: oct16-medium
> Attachments: YARN-5694.001.patch, YARN-5694.002.patch, 
> YARN-5694.003.patch, YARN-5694.004.patch, YARN-5694.004.patch, 
> YARN-5694.005.patch, YARN-5694.006.patch, YARN-5694.007.patch, 
> YARN-5694.branch-2.7.001.patch, YARN-5694.branch-2.7.002.patch
>
>
> There are two cases.  In branch-2.7, the 
> {{ZKRMStateStore.VerifyActiveStatusThread}} is always started, even when 
> using embedded or Curator failover.  In branch-2.8, the 
> {{ZKRMStateStore.VerifyActiveStatusThread}} is only started when HA is 
> disabled, which makes no sense.  Based on the JIRA that introduced that 
> change (YARN-4559), I believe the intent was to start it only when embedded 
> failover is disabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5694) ZKRMStateStore should always start its verification thread to prevent accidental state store corruption

2016-11-02 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15629737#comment-15629737
 ] 

Jian He commented on YARN-5694:
---

sorry, I don't quite get it. why is the active thread needed to run in non-HA 
mode ? 
Even in HA mode, it may not be needed, because the curator library can 
automatically detect whether the RM is still active and send notification.  
IIUC, no need a separate thread to detect that 

> ZKRMStateStore should always start its verification thread to prevent 
> accidental state store corruption
> ---
>
> Key: YARN-5694
> URL: https://issues.apache.org/jira/browse/YARN-5694
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.0.0-alpha1
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Critical
>  Labels: oct16-medium
> Attachments: YARN-5694.001.patch, YARN-5694.002.patch, 
> YARN-5694.003.patch, YARN-5694.004.patch, YARN-5694.004.patch, 
> YARN-5694.005.patch, YARN-5694.006.patch, YARN-5694.007.patch, 
> YARN-5694.branch-2.7.001.patch, YARN-5694.branch-2.7.002.patch
>
>
> There are two cases.  In branch-2.7, the 
> {{ZKRMStateStore.VerifyActiveStatusThread}} is always started, even when 
> using embedded or Curator failover.  In branch-2.8, the 
> {{ZKRMStateStore.VerifyActiveStatusThread}} is only started when HA is 
> disabled, which makes no sense.  Based on the JIRA that introduced that 
> change (YARN-4559), I believe the intent was to start it only when embedded 
> failover is disabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5694) ZKRMStateStore should always start its verification thread to prevent accidental state store corruption

2016-11-02 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15629870#comment-15629870
 ] 

Daniel Templeton commented on YARN-5694:


The concern is that the leader election and state store can be configured to 
use different ZK instances in HA mode.  In that case, the state store still has 
to protect itself.  In non-HA, it may still be possible for a second RM to 
start using the same cluster ID and same ZK instance, which would corrupt the 
state store.  By having the state store be always vigilant, we protect 
ourselves from state store corruption in all cases.

> ZKRMStateStore should always start its verification thread to prevent 
> accidental state store corruption
> ---
>
> Key: YARN-5694
> URL: https://issues.apache.org/jira/browse/YARN-5694
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.0.0-alpha1
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Critical
>  Labels: oct16-medium
> Attachments: YARN-5694.001.patch, YARN-5694.002.patch, 
> YARN-5694.003.patch, YARN-5694.004.patch, YARN-5694.004.patch, 
> YARN-5694.005.patch, YARN-5694.006.patch, YARN-5694.007.patch, 
> YARN-5694.branch-2.7.001.patch, YARN-5694.branch-2.7.002.patch
>
>
> There are two cases.  In branch-2.7, the 
> {{ZKRMStateStore.VerifyActiveStatusThread}} is always started, even when 
> using embedded or Curator failover.  In branch-2.8, the 
> {{ZKRMStateStore.VerifyActiveStatusThread}} is only started when HA is 
> disabled, which makes no sense.  Based on the JIRA that introduced that 
> change (YARN-4559), I believe the intent was to start it only when embedded 
> failover is disabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5694) ZKRMStateStore should always start its verification thread to prevent accidental state store corruption

2016-11-07 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15646047#comment-15646047
 ] 

Jian He commented on YARN-5694:
---

One other thing is the normal state-store operation already piggybacks a create 
and delete fencing node operations. In that case, it's already covered that if 
the state-store is fenced, the store operation will not succeed.

> ZKRMStateStore should always start its verification thread to prevent 
> accidental state store corruption
> ---
>
> Key: YARN-5694
> URL: https://issues.apache.org/jira/browse/YARN-5694
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.0.0-alpha1
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Critical
>  Labels: oct16-medium
> Attachments: YARN-5694.001.patch, YARN-5694.002.patch, 
> YARN-5694.003.patch, YARN-5694.004.patch, YARN-5694.004.patch, 
> YARN-5694.005.patch, YARN-5694.006.patch, YARN-5694.007.patch, 
> YARN-5694.branch-2.7.001.patch, YARN-5694.branch-2.7.002.patch
>
>
> There are two cases.  In branch-2.7, the 
> {{ZKRMStateStore.VerifyActiveStatusThread}} is always started, even when 
> using embedded or Curator failover.  In branch-2.8, the 
> {{ZKRMStateStore.VerifyActiveStatusThread}} is only started when HA is 
> disabled, which makes no sense.  Based on the JIRA that introduced that 
> change (YARN-4559), I believe the intent was to start it only when embedded 
> failover is disabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5694) ZKRMStateStore should always start its verification thread to prevent accidental state store corruption

2016-11-10 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15654259#comment-15654259
 ] 

Daniel Templeton commented on YARN-5694:


Yes, but if the RM isn't in HA mode, the fencing is quietly ignored, which is 
also something I should address in the next version of this patch.

The reason to have the thread always run is so that we react earlier.  If we 
agree that it's bad to have two RMs accidentally sharing the same state store, 
why would you not want to catch the issue as early as possible?

> ZKRMStateStore should always start its verification thread to prevent 
> accidental state store corruption
> ---
>
> Key: YARN-5694
> URL: https://issues.apache.org/jira/browse/YARN-5694
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.0.0-alpha1
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Critical
>  Labels: oct16-medium
> Attachments: YARN-5694.001.patch, YARN-5694.002.patch, 
> YARN-5694.003.patch, YARN-5694.004.patch, YARN-5694.004.patch, 
> YARN-5694.005.patch, YARN-5694.006.patch, YARN-5694.007.patch, 
> YARN-5694.branch-2.7.001.patch, YARN-5694.branch-2.7.002.patch
>
>
> There are two cases.  In branch-2.7, the 
> {{ZKRMStateStore.VerifyActiveStatusThread}} is always started, even when 
> using embedded or Curator failover.  In branch-2.8, the 
> {{ZKRMStateStore.VerifyActiveStatusThread}} is only started when HA is 
> disabled, which makes no sense.  Based on the JIRA that introduced that 
> change (YARN-4559), I believe the intent was to start it only when embedded 
> failover is disabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5694) ZKRMStateStore should always start its verification thread to prevent accidental state store corruption

2016-11-10 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15654697#comment-15654697
 ] 

Jian He commented on YARN-5694:
---

bq. If we agree that it's bad to have two RMs accidentally sharing the same 
state store, 
If it's in non-HA mode, currently there's no protection in the ZKStore 
preventing two RMs from sharing the same store. All the ACLs setting related 
code is only used in HA mode. Essentially, with current patch, I doubt it will 
get NoAuthException in the verifyThread, without making user change the ACLs 
manually. So the handling code in this patch will not be triggered with default 
setting. Maybe I'm wrong, you may try on a real cluster..

bq. why would you not want to catch the issue as early as possible?
My point is that first,will this code work as mentioned above. second, if 
there's no difference in terms of functionality, why do I need to start a 
thread pinging the zk continuously every few seconds.  Of course, I might miss 
something, you may clarify more...

Also, is the use-case mainly about two clusters sharing the same zk-store with 
the same path ?  IMHO, this is not a primary use-case to solve, if user 
mis-configured, it's user's fault. There are many other places that can go 
wrong.  e.g. if two clusters configure the same path for anything on HDFS.

If the use-case is about two RMs sharing the same zk-path in the same cluster 
with non-HA mode. I think in non-HA mode, the invalid RM will not take workload 
in the first place, clients, NMs will not switch to that RM if HA is not 
configured properly. 

> ZKRMStateStore should always start its verification thread to prevent 
> accidental state store corruption
> ---
>
> Key: YARN-5694
> URL: https://issues.apache.org/jira/browse/YARN-5694
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.0.0-alpha1
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Critical
>  Labels: oct16-medium
> Attachments: YARN-5694.001.patch, YARN-5694.002.patch, 
> YARN-5694.003.patch, YARN-5694.004.patch, YARN-5694.004.patch, 
> YARN-5694.005.patch, YARN-5694.006.patch, YARN-5694.007.patch, 
> YARN-5694.branch-2.7.001.patch, YARN-5694.branch-2.7.002.patch
>
>
> There are two cases.  In branch-2.7, the 
> {{ZKRMStateStore.VerifyActiveStatusThread}} is always started, even when 
> using embedded or Curator failover.  In branch-2.8, the 
> {{ZKRMStateStore.VerifyActiveStatusThread}} is only started when HA is 
> disabled, which makes no sense.  Based on the JIRA that introduced that 
> change (YARN-4559), I believe the intent was to start it only when embedded 
> failover is disabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5694) ZKRMStateStore should always start its verification thread to prevent accidental state store corruption

2016-11-21 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15684210#comment-15684210
 ] 

Daniel Templeton commented on YARN-5694:


Fair points.  I see two ways forward.  Either I make all the HA-only behavior 
always enabled, which will protect the RM in the event of a second cluster 
being misconfigured to use the first cluster's state store, or I drop this JIRA 
back to its original purpose, which was fixing the conditional in 
{{ZKRMStateStore.startInternal()}} and dropping the {{synchronized}} from 
{{closeInternal()}}.  I'll post a patch for the latter, and we can discuss 
whether it's worth filing another issue for the former.

> ZKRMStateStore should always start its verification thread to prevent 
> accidental state store corruption
> ---
>
> Key: YARN-5694
> URL: https://issues.apache.org/jira/browse/YARN-5694
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.0.0-alpha1
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Critical
>  Labels: oct16-medium
> Attachments: YARN-5694.001.patch, YARN-5694.002.patch, 
> YARN-5694.003.patch, YARN-5694.004.patch, YARN-5694.004.patch, 
> YARN-5694.005.patch, YARN-5694.006.patch, YARN-5694.007.patch, 
> YARN-5694.branch-2.7.001.patch, YARN-5694.branch-2.7.002.patch
>
>
> There are two cases.  In branch-2.7, the 
> {{ZKRMStateStore.VerifyActiveStatusThread}} is always started, even when 
> using embedded or Curator failover.  In branch-2.8, the 
> {{ZKRMStateStore.VerifyActiveStatusThread}} is only started when HA is 
> disabled, which makes no sense.  Based on the JIRA that introduced that 
> change (YARN-4559), I believe the intent was to start it only when embedded 
> failover is disabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5694) ZKRMStateStore should always start its verification thread to prevent accidental state store corruption

2016-11-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15684494#comment-15684494
 ] 

Hadoop QA commented on YARN-5694:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
23s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  9m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 48m 17s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
20s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 68m 24s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.yarn.server.resourcemanager.TestRMRestart |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:a9ad5d6 |
| JIRA Issue | YARN-5694 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12839846/YARN-5694.008.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 6b55f5b31229 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 
17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 683e0c7 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/13997/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/13997/testReport/ |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/13997/console |
| Powered by | Apache Yetus 0.4.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> ZKRMStateStore should always start its verification thread to prevent 
> accidental stat

[jira] [Commented] (YARN-5694) ZKRMStateStore should always start its verification thread to prevent accidental state store corruption

2016-11-21 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15684625#comment-15684625
 ] 

Daniel Templeton commented on YARN-5694:


Test failure is YARN-5362.  I should probably add some tests, though...

> ZKRMStateStore should always start its verification thread to prevent 
> accidental state store corruption
> ---
>
> Key: YARN-5694
> URL: https://issues.apache.org/jira/browse/YARN-5694
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.0.0-alpha1
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Critical
>  Labels: oct16-medium
> Attachments: YARN-5694.001.patch, YARN-5694.002.patch, 
> YARN-5694.003.patch, YARN-5694.004.patch, YARN-5694.004.patch, 
> YARN-5694.005.patch, YARN-5694.006.patch, YARN-5694.007.patch, 
> YARN-5694.008.patch, YARN-5694.branch-2.7.001.patch, 
> YARN-5694.branch-2.7.002.patch
>
>
> There are two cases.  In branch-2.7, the 
> {{ZKRMStateStore.VerifyActiveStatusThread}} is always started, even when 
> using embedded or Curator failover.  In branch-2.8, the 
> {{ZKRMStateStore.VerifyActiveStatusThread}} is only started when HA is 
> disabled, which makes no sense.  Based on the JIRA that introduced that 
> change (YARN-4559), I believe the intent was to start it only when embedded 
> failover is disabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5694) ZKRMStateStore should always start its verification thread to prevent accidental state store corruption

2016-11-21 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15684826#comment-15684826
 ] 

Jian He commented on YARN-5694:
---

bq. (Currently, it's only started with manual failover, which doesn't make any 
sense.)
IIRC, it's started only with manual failover because, in case of curator based 
leader elector, the curator library will trigger notification  already if RM is 
not active. No need for an additional polling thread...  This maybe the case 
for Hadoop's ActiveStandbyElector too..
If you think it's better to keep this for Hadoop's ActiveStandbyElector , maybe 
we can do something like:  {{ if (HA.isEnabled()  &&  !curatorEnabled) }}



> ZKRMStateStore should always start its verification thread to prevent 
> accidental state store corruption
> ---
>
> Key: YARN-5694
> URL: https://issues.apache.org/jira/browse/YARN-5694
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.0.0-alpha1
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Critical
>  Labels: oct16-medium
> Attachments: YARN-5694.001.patch, YARN-5694.002.patch, 
> YARN-5694.003.patch, YARN-5694.004.patch, YARN-5694.004.patch, 
> YARN-5694.005.patch, YARN-5694.006.patch, YARN-5694.007.patch, 
> YARN-5694.008.patch, YARN-5694.branch-2.7.001.patch, 
> YARN-5694.branch-2.7.002.patch
>
>
> There are two cases.  In branch-2.7, the 
> {{ZKRMStateStore.VerifyActiveStatusThread}} is always started, even when 
> using embedded or Curator failover.  In branch-2.8, the 
> {{ZKRMStateStore.VerifyActiveStatusThread}} is only started when HA is 
> disabled, which makes no sense.  Based on the JIRA that introduced that 
> change (YARN-4559), I believe the intent was to start it only when embedded 
> failover is disabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org