[jira] [Commented] (YARN-4041) Slow delegation token renewal can severely prolong RM recovery

2015-10-23 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970945#comment-14970945
 ] 

Jason Lowe commented on YARN-4041:
--

+1 for the latest patch, will commit this later today if there are no 
objections.

> Slow delegation token renewal can severely prolong RM recovery
> --
>
> Key: YARN-4041
> URL: https://issues.apache.org/jira/browse/YARN-4041
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Sunil G
> Attachments: 0001-YARN-4041.patch, 0002-YARN-4041.patch, 
> 0003-YARN-4041.patch, 0004-YARN-4041.patch, 0005-YARN-4041.patch
>
>
> When the RM does a work-preserving restart it synchronously tries to renew 
> delegation tokens for every active application.  If a token server happens to 
> be down or is running slow and a lot of the active apps were using tokens 
> from that server then it can have a huge impact on the time it takes the RM 
> to process the restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4041) Slow delegation token renewal can severely prolong RM recovery

2015-10-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970579#comment-14970579
 ] 

Hadoop QA commented on YARN-4041:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  21m  9s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   9m 56s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  12m 11s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 26s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m  3s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 42s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 38s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 30s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |  66m 37s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | | 115m 17s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12768222/0005-YARN-4041.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 124a412 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9540/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9540/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9540/console |


This message was automatically generated.

> Slow delegation token renewal can severely prolong RM recovery
> --
>
> Key: YARN-4041
> URL: https://issues.apache.org/jira/browse/YARN-4041
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Sunil G
> Attachments: 0001-YARN-4041.patch, 0002-YARN-4041.patch, 
> 0003-YARN-4041.patch, 0004-YARN-4041.patch, 0005-YARN-4041.patch
>
>
> When the RM does a work-preserving restart it synchronously tries to renew 
> delegation tokens for every active application.  If a token server happens to 
> be down or is running slow and a lot of the active apps were using tokens 
> from that server then it can have a huge impact on the time it takes the RM 
> to process the restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4041) Slow delegation token renewal can severely prolong RM recovery

2015-10-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971998#comment-14971998
 ] 

Hudson commented on YARN-4041:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #589 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/589/])
YARN-4041. Slow delegation token renewal can severely prolong RM (jlowe: rev 
d3a34a4f388155f6a7ef040e244ce7be788cd28b)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java


> Slow delegation token renewal can severely prolong RM recovery
> --
>
> Key: YARN-4041
> URL: https://issues.apache.org/jira/browse/YARN-4041
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Sunil G
> Fix For: 2.7.2
>
> Attachments: 0001-YARN-4041.patch, 0002-YARN-4041.patch, 
> 0003-YARN-4041.patch, 0004-YARN-4041.patch, 0005-YARN-4041.patch
>
>
> When the RM does a work-preserving restart it synchronously tries to renew 
> delegation tokens for every active application.  If a token server happens to 
> be down or is running slow and a lot of the active apps were using tokens 
> from that server then it can have a huge impact on the time it takes the RM 
> to process the restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4041) Slow delegation token renewal can severely prolong RM recovery

2015-10-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14972200#comment-14972200
 ] 

Hudson commented on YARN-4041:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #531 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/531/])
YARN-4041. Slow delegation token renewal can severely prolong RM (jlowe: rev 
d3a34a4f388155f6a7ef040e244ce7be788cd28b)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java


> Slow delegation token renewal can severely prolong RM recovery
> --
>
> Key: YARN-4041
> URL: https://issues.apache.org/jira/browse/YARN-4041
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Sunil G
> Fix For: 2.7.2
>
> Attachments: 0001-YARN-4041.patch, 0002-YARN-4041.patch, 
> 0003-YARN-4041.patch, 0004-YARN-4041.patch, 0005-YARN-4041.patch
>
>
> When the RM does a work-preserving restart it synchronously tries to renew 
> delegation tokens for every active application.  If a token server happens to 
> be down or is running slow and a lot of the active apps were using tokens 
> from that server then it can have a huge impact on the time it takes the RM 
> to process the restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4041) Slow delegation token renewal can severely prolong RM recovery

2015-10-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14972103#comment-14972103
 ] 

Hudson commented on YARN-4041:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #1312 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/1312/])
YARN-4041. Slow delegation token renewal can severely prolong RM (jlowe: rev 
d3a34a4f388155f6a7ef040e244ce7be788cd28b)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java


> Slow delegation token renewal can severely prolong RM recovery
> --
>
> Key: YARN-4041
> URL: https://issues.apache.org/jira/browse/YARN-4041
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Sunil G
> Fix For: 2.7.2
>
> Attachments: 0001-YARN-4041.patch, 0002-YARN-4041.patch, 
> 0003-YARN-4041.patch, 0004-YARN-4041.patch, 0005-YARN-4041.patch
>
>
> When the RM does a work-preserving restart it synchronously tries to renew 
> delegation tokens for every active application.  If a token server happens to 
> be down or is running slow and a lot of the active apps were using tokens 
> from that server then it can have a huge impact on the time it takes the RM 
> to process the restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4041) Slow delegation token renewal can severely prolong RM recovery

2015-10-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14972117#comment-14972117
 ] 

Hudson commented on YARN-4041:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2521 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2521/])
YARN-4041. Slow delegation token renewal can severely prolong RM (jlowe: rev 
d3a34a4f388155f6a7ef040e244ce7be788cd28b)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java


> Slow delegation token renewal can severely prolong RM recovery
> --
>
> Key: YARN-4041
> URL: https://issues.apache.org/jira/browse/YARN-4041
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Sunil G
> Fix For: 2.7.2
>
> Attachments: 0001-YARN-4041.patch, 0002-YARN-4041.patch, 
> 0003-YARN-4041.patch, 0004-YARN-4041.patch, 0005-YARN-4041.patch
>
>
> When the RM does a work-preserving restart it synchronously tries to renew 
> delegation tokens for every active application.  If a token server happens to 
> be down or is running slow and a lot of the active apps were using tokens 
> from that server then it can have a huge impact on the time it takes the RM 
> to process the restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4041) Slow delegation token renewal can severely prolong RM recovery

2015-10-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971852#comment-14971852
 ] 

Hudson commented on YARN-4041:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8697 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8697/])
YARN-4041. Slow delegation token renewal can severely prolong RM (jlowe: rev 
d3a34a4f388155f6a7ef040e244ce7be788cd28b)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java


> Slow delegation token renewal can severely prolong RM recovery
> --
>
> Key: YARN-4041
> URL: https://issues.apache.org/jira/browse/YARN-4041
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Sunil G
> Fix For: 2.7.2
>
> Attachments: 0001-YARN-4041.patch, 0002-YARN-4041.patch, 
> 0003-YARN-4041.patch, 0004-YARN-4041.patch, 0005-YARN-4041.patch
>
>
> When the RM does a work-preserving restart it synchronously tries to renew 
> delegation tokens for every active application.  If a token server happens to 
> be down or is running slow and a lot of the active apps were using tokens 
> from that server then it can have a huge impact on the time it takes the RM 
> to process the restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4041) Slow delegation token renewal can severely prolong RM recovery

2015-10-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971966#comment-14971966
 ] 

Hudson commented on YARN-4041:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #576 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/576/])
YARN-4041. Slow delegation token renewal can severely prolong RM (jlowe: rev 
d3a34a4f388155f6a7ef040e244ce7be788cd28b)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java


> Slow delegation token renewal can severely prolong RM recovery
> --
>
> Key: YARN-4041
> URL: https://issues.apache.org/jira/browse/YARN-4041
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Sunil G
> Fix For: 2.7.2
>
> Attachments: 0001-YARN-4041.patch, 0002-YARN-4041.patch, 
> 0003-YARN-4041.patch, 0004-YARN-4041.patch, 0005-YARN-4041.patch
>
>
> When the RM does a work-preserving restart it synchronously tries to renew 
> delegation tokens for every active application.  If a token server happens to 
> be down or is running slow and a lot of the active apps were using tokens 
> from that server then it can have a huge impact on the time it takes the RM 
> to process the restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4041) Slow delegation token renewal can severely prolong RM recovery

2015-10-22 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969820#comment-14969820
 ] 

Jason Lowe commented on YARN-4041:
--

The problem with checking the renewer event queue directly is that the queue 
can be empty but processing has not yet completed.  Threads can still be 
executing the last events, having just pulled them from the queue to leave it 
empty.  Therefore the test is still racy.  A simpler approach would be to just 
keep checking if the tokens are equal.  If they aren't then sleep for a bit 
then try again, up to some limit of time to keep checking.

By the way, we should not sleep an entire second between checks.  All those 
seconds of waiting add up across all of our tests doing it, making it take 
significantly longer to run them overall.  We should be sleeping for only 10ms 
or so.  That's still a large amount of time for modern processors to get work 
done while we're waiting, and we still won't be spinning non-stop on the CPU.


> Slow delegation token renewal can severely prolong RM recovery
> --
>
> Key: YARN-4041
> URL: https://issues.apache.org/jira/browse/YARN-4041
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Sunil G
> Attachments: 0001-YARN-4041.patch, 0002-YARN-4041.patch, 
> 0003-YARN-4041.patch, 0004-YARN-4041.patch
>
>
> When the RM does a work-preserving restart it synchronously tries to renew 
> delegation tokens for every active application.  If a token server happens to 
> be down or is running slow and a lot of the active apps were using tokens 
> from that server then it can have a huge impact on the time it takes the RM 
> to process the restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4041) Slow delegation token renewal can severely prolong RM recovery

2015-10-22 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969105#comment-14969105
 ] 

Sunil G commented on YARN-4041:
---

Test case failures are not related. Its passing locally.

> Slow delegation token renewal can severely prolong RM recovery
> --
>
> Key: YARN-4041
> URL: https://issues.apache.org/jira/browse/YARN-4041
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Sunil G
> Attachments: 0001-YARN-4041.patch, 0002-YARN-4041.patch, 
> 0003-YARN-4041.patch, 0004-YARN-4041.patch
>
>
> When the RM does a work-preserving restart it synchronously tries to renew 
> delegation tokens for every active application.  If a token server happens to 
> be down or is running slow and a lot of the active apps were using tokens 
> from that server then it can have a huge impact on the time it takes the RM 
> to process the restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4041) Slow delegation token renewal can severely prolong RM recovery

2015-10-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14967776#comment-14967776
 ] 

Hadoop QA commented on YARN-4041:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  19m 20s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   9m  1s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  11m 39s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 25s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 59s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 41s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 38s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 39s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |  58m 36s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | | 104m  5s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler |
|   | 
hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification |
|   | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched |
|   | 
hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesForCSWithPartitions |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12767832/0004-YARN-4041.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / e27c2ae |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9511/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9511/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9511/console |


This message was automatically generated.

> Slow delegation token renewal can severely prolong RM recovery
> --
>
> Key: YARN-4041
> URL: https://issues.apache.org/jira/browse/YARN-4041
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Sunil G
> Attachments: 0001-YARN-4041.patch, 0002-YARN-4041.patch, 
> 0003-YARN-4041.patch, 0004-YARN-4041.patch
>
>
> When the RM does a work-preserving restart it synchronously tries to renew 
> delegation tokens for every active application.  If a token server happens to 
> be down or is running slow and a lot of the active apps were using tokens 
> from that server then it can have a huge impact on the time it takes the RM 
> to process the restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4041) Slow delegation token renewal can severely prolong RM recovery

2015-10-21 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14966552#comment-14966552
 ] 

Sunil G commented on YARN-4041:
---

Thank you [~jlowe] for the comments.
Yes, I was also planning to move this logic inside the waitForTokensToBeRenewed 
method, but the test failure occurred only for one test case, hence placed 
outside. I think its better we place that logic inside waitForTokensToBeRenewed 
itself as suggested. I also was not much liking the solution of sleep, however 
a better checkpoint was not raised explicitly from DelegationTokenRenewer. I 
also thought of checking the event queue size there. Now I feel we can verify 
that whether any Token renewal event is raised or not. It can be a good 
checkpoint. I will attach a patch for this with other comment fix.

> Slow delegation token renewal can severely prolong RM recovery
> --
>
> Key: YARN-4041
> URL: https://issues.apache.org/jira/browse/YARN-4041
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Sunil G
> Attachments: 0001-YARN-4041.patch, 0002-YARN-4041.patch, 
> 0003-YARN-4041.patch
>
>
> When the RM does a work-preserving restart it synchronously tries to renew 
> delegation tokens for every active application.  If a token server happens to 
> be down or is running slow and a lot of the active apps were using tokens 
> from that server then it can have a huge impact on the time it takes the RM 
> to process the restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4041) Slow delegation token renewal can severely prolong RM recovery

2015-10-20 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14964735#comment-14964735
 ] 

Sunil G commented on YARN-4041:
---

Test case failures looks related,  I will debug and will check. 

> Slow delegation token renewal can severely prolong RM recovery
> --
>
> Key: YARN-4041
> URL: https://issues.apache.org/jira/browse/YARN-4041
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Sunil G
> Attachments: 0001-YARN-4041.patch, 0002-YARN-4041.patch
>
>
> When the RM does a work-preserving restart it synchronously tries to renew 
> delegation tokens for every active application.  If a token server happens to 
> be down or is running slow and a lot of the active apps were using tokens 
> from that server then it can have a huge impact on the time it takes the RM 
> to process the restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4041) Slow delegation token renewal can severely prolong RM recovery

2015-10-20 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14965864#comment-14965864
 ] 

Jason Lowe commented on YARN-4041:
--

Thanks for updating the patch, Sunil!

When fixing the test, why wasn't the fix in waitForTokensToBeRenewed?  Also I'm 
not thrilled with the idea of sleeping for 1 second per application and hoping 
it's enough time.  And we're getting out early when there is at least one token 
in the token set, but there's a race where we may have taken a snapshot before 
all the tokens are there.  Can't we key off the app start events coming out of 
the token renewal process to know when we're done?  Would be nice if there were 
a more reliable way so we can avoid arbitrary sleeps (which tend to slow down 
unit tests overall) and racy tests.

Also noticed on subsequent look that AbsrtactDelegationTokenRenewerAppEvent s/b 
AbstractDelegationTokenRenewerAppEvent.

> Slow delegation token renewal can severely prolong RM recovery
> --
>
> Key: YARN-4041
> URL: https://issues.apache.org/jira/browse/YARN-4041
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Sunil G
> Attachments: 0001-YARN-4041.patch, 0002-YARN-4041.patch, 
> 0003-YARN-4041.patch
>
>
> When the RM does a work-preserving restart it synchronously tries to renew 
> delegation tokens for every active application.  If a token server happens to 
> be down or is running slow and a lot of the active apps were using tokens 
> from that server then it can have a huge impact on the time it takes the RM 
> to process the restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4041) Slow delegation token renewal can severely prolong RM recovery

2015-10-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14965071#comment-14965071
 ] 

Hadoop QA commented on YARN-4041:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m 26s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 58s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 32s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 51s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 30s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 28s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |  57m 50s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  98m 36s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12767585/0003-YARN-4041.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 9cb5d35 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9490/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9490/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9490/console |


This message was automatically generated.

> Slow delegation token renewal can severely prolong RM recovery
> --
>
> Key: YARN-4041
> URL: https://issues.apache.org/jira/browse/YARN-4041
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Sunil G
> Attachments: 0001-YARN-4041.patch, 0002-YARN-4041.patch, 
> 0003-YARN-4041.patch
>
>
> When the RM does a work-preserving restart it synchronously tries to renew 
> delegation tokens for every active application.  If a token server happens to 
> be down or is running slow and a lot of the active apps were using tokens 
> from that server then it can have a huge impact on the time it takes the RM 
> to process the restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4041) Slow delegation token renewal can severely prolong RM recovery

2015-10-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14964093#comment-14964093
 ] 

Hadoop QA commented on YARN-4041:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m 25s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   8m  2s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  11m  7s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 25s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   0m 52s | The applied patch generated  1 
new checkstyle issues (total was 150, now 149). |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 37s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 37s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 37s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |  57m 51s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  99m 38s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.yarn.server.resourcemanager.TestRMRestart |
|   | hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler |
| Timed out tests | 
org.apache.hadoop.yarn.server.resourcemanager.TestKillApplicationWithRMHA |
|   | 
org.apache.hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer
 |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12751320/0001-YARN-4041.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 6144e01 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/9485/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9485/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9485/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9485/console |


This message was automatically generated.

> Slow delegation token renewal can severely prolong RM recovery
> --
>
> Key: YARN-4041
> URL: https://issues.apache.org/jira/browse/YARN-4041
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Sunil G
> Attachments: 0001-YARN-4041.patch
>
>
> When the RM does a work-preserving restart it synchronously tries to renew 
> delegation tokens for every active application.  If a token server happens to 
> be down or is running slow and a lot of the active apps were using tokens 
> from that server then it can have a huge impact on the time it takes the RM 
> to process the restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4041) Slow delegation token renewal can severely prolong RM recovery

2015-10-19 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14963934#comment-14963934
 ] 

Jason Lowe commented on YARN-4041:
--

Sorry for the delay.  Looks good to me as well, kicking Jenkins to comment on 
the patch.

> Slow delegation token renewal can severely prolong RM recovery
> --
>
> Key: YARN-4041
> URL: https://issues.apache.org/jira/browse/YARN-4041
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Sunil G
> Attachments: 0001-YARN-4041.patch
>
>
> When the RM does a work-preserving restart it synchronously tries to renew 
> delegation tokens for every active application.  If a token server happens to 
> be down or is running slow and a lot of the active apps were using tokens 
> from that server then it can have a huge impact on the time it takes the RM 
> to process the restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4041) Slow delegation token renewal can severely prolong RM recovery

2015-10-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14964534#comment-14964534
 ] 

Hadoop QA commented on YARN-4041:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  18m 28s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   8m 32s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 39s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 52s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 30s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 30s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |  58m  8s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | | 100m 40s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.yarn.server.resourcemanager.TestRMRestart |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12767527/0002-YARN-4041.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 7e2837f |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9488/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9488/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9488/console |


This message was automatically generated.

> Slow delegation token renewal can severely prolong RM recovery
> --
>
> Key: YARN-4041
> URL: https://issues.apache.org/jira/browse/YARN-4041
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Sunil G
> Attachments: 0001-YARN-4041.patch, 0002-YARN-4041.patch
>
>
> When the RM does a work-preserving restart it synchronously tries to renew 
> delegation tokens for every active application.  If a token server happens to 
> be down or is running slow and a lot of the active apps were using tokens 
> from that server then it can have a huge impact on the time it takes the RM 
> to process the restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4041) Slow delegation token renewal can severely prolong RM recovery

2015-10-12 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14953688#comment-14953688
 ] 

Jian He commented on YARN-4041:
---

looks good to me overall, hold on committing in case any comments from others.

> Slow delegation token renewal can severely prolong RM recovery
> --
>
> Key: YARN-4041
> URL: https://issues.apache.org/jira/browse/YARN-4041
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Sunil G
> Attachments: 0001-YARN-4041.patch
>
>
> When the RM does a work-preserving restart it synchronously tries to renew 
> delegation tokens for every active application.  If a token server happens to 
> be down or is running slow and a lot of the active apps were using tokens 
> from that server then it can have a huge impact on the time it takes the RM 
> to process the restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4041) Slow delegation token renewal can severely prolong RM recovery

2015-10-10 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14952161#comment-14952161
 ] 

Sunil G commented on YARN-4041:
---

Hi Bob,
I have shared a patch which uses delegation token renewal during recovery in an 
asynchronous way. I will rebase the same against trunk now. Meantime [~jlowe], 
[~rohithsharma] and [~kasha] could you please take a look on this patch.

> Slow delegation token renewal can severely prolong RM recovery
> --
>
> Key: YARN-4041
> URL: https://issues.apache.org/jira/browse/YARN-4041
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Sunil G
> Attachments: 0001-YARN-4041.patch
>
>
> When the RM does a work-preserving restart it synchronously tries to renew 
> delegation tokens for every active application.  If a token server happens to 
> be down or is running slow and a lot of the active apps were using tokens 
> from that server then it can have a huge impact on the time it takes the RM 
> to process the restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4041) Slow delegation token renewal can severely prolong RM recovery

2015-10-09 Thread Bob (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14951510#comment-14951510
 ] 

Bob commented on YARN-4041:
---

Hi, [~sunilg]  , Any update or idea on this issue? 

> Slow delegation token renewal can severely prolong RM recovery
> --
>
> Key: YARN-4041
> URL: https://issues.apache.org/jira/browse/YARN-4041
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Sunil G
> Attachments: 0001-YARN-4041.patch
>
>
> When the RM does a work-preserving restart it synchronously tries to renew 
> delegation tokens for every active application.  If a token server happens to 
> be down or is running slow and a lot of the active apps were using tokens 
> from that server then it can have a huge impact on the time it takes the RM 
> to process the restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4041) Slow delegation token renewal can severely prolong RM recovery

2015-08-13 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14695316#comment-14695316
 ] 

Jason Lowe commented on YARN-4041:
--

bq. IIRR, synchronous recovery was to fail-fast if recovery doesn't work. With 
the proposed change, what happens when the recovery fails?
Arguably the same thing that happens when the RM goes to renew tokens on a live 
application and fails without a restart.  IIRC this is not fatal to either the 
RM nor the application when this occurs today.  In general I think we should 
make restarting as orthogonal as possible to token renewals, and ideally RM 
restart should not cause an out-of-band token renewal storm.


 Slow delegation token renewal can severely prolong RM recovery
 --

 Key: YARN-4041
 URL: https://issues.apache.org/jira/browse/YARN-4041
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Sunil G

 When the RM does a work-preserving restart it synchronously tries to renew 
 delegation tokens for every active application.  If a token server happens to 
 be down or is running slow and a lot of the active apps were using tokens 
 from that server then it can have a huge impact on the time it takes the RM 
 to process the restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4041) Slow delegation token renewal can severely prolong RM recovery

2015-08-12 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14694730#comment-14694730
 ] 

Karthik Kambatla commented on YARN-4041:


IIRR, synchronous recovery was to fail-fast if recovery doesn't work. With the 
proposed change, what happens when the recovery fails? 

 Slow delegation token renewal can severely prolong RM recovery
 --

 Key: YARN-4041
 URL: https://issues.apache.org/jira/browse/YARN-4041
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Sunil G

 When the RM does a work-preserving restart it synchronously tries to renew 
 delegation tokens for every active application.  If a token server happens to 
 be down or is running slow and a lot of the active apps were using tokens 
 from that server then it can have a huge impact on the time it takes the RM 
 to process the restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4041) Slow delegation token renewal can severely prolong RM recovery

2015-08-10 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14681226#comment-14681226
 ] 

Rohith Sharma K S commented on YARN-4041:
-

One correction in my previous comment, it *NOT 8 minutes*, its *8-10 seconds*. 
So {{8 seconds * 60 apps = 480 seconds i.e 8 minutes}}

 Slow delegation token renewal can severely prolong RM recovery
 --

 Key: YARN-4041
 URL: https://issues.apache.org/jira/browse/YARN-4041
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Sunil G

 When the RM does a work-preserving restart it synchronously tries to renew 
 delegation tokens for every active application.  If a token server happens to 
 be down or is running slow and a lot of the active apps were using tokens 
 from that server then it can have a huge impact on the time it takes the RM 
 to process the restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4041) Slow delegation token renewal can severely prolong RM recovery

2015-08-10 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14680515#comment-14680515
 ] 

Jason Lowe commented on YARN-4041:
--

The active apps already have the tokens and are running on the cluster, so I'm 
not sure why it's so pressing that we synchronously process token renewal upon 
recovery.  This should be made asynchronous, or even better, we shouldn't do 
any renewals just because we restarted.  Ideally the RM should be tracking when 
tokens need to be renewed and renew them at that point.  If we restart and some 
tokens are due for a renewal then we should go ahead and renew those, but I 
don't think the RM should blindly renew all tokens for apps that are already 
active and running on the cluster when it restarts.

 Slow delegation token renewal can severely prolong RM recovery
 --

 Key: YARN-4041
 URL: https://issues.apache.org/jira/browse/YARN-4041
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Jason Lowe

 When the RM does a work-preserving restart it synchronously tries to renew 
 delegation tokens for every active application.  If a token server happens to 
 be down or is running slow and a lot of the active apps were using tokens 
 from that server then it can have a huge impact on the time it takes the RM 
 to process the restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4041) Slow delegation token renewal can severely prolong RM recovery

2015-08-10 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14680620#comment-14680620
 ] 

Sunil G commented on YARN-4041:
---

Could we use an async way here and use {{DelegationTokenRenewerRunnable}} to 
renew tokens if needed. A new state can be added in this class as below
{code}
  enum DelegationTokenRenewerEventType {
VERIFY_AND_START_APPLICATION,
+RECOVER_APPLICATION,
FINISH_APPLICATION
  }
{code}

And we can handle this recover event to decide to renew token from 
{{DelegationTokenRenewer}}. Will it be fine?

 Slow delegation token renewal can severely prolong RM recovery
 --

 Key: YARN-4041
 URL: https://issues.apache.org/jira/browse/YARN-4041
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Jason Lowe

 When the RM does a work-preserving restart it synchronously tries to renew 
 delegation tokens for every active application.  If a token server happens to 
 be down or is running slow and a lot of the active apps were using tokens 
 from that server then it can have a huge impact on the time it takes the RM 
 to process the restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4041) Slow delegation token renewal can severely prolong RM recovery

2015-08-10 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14680907#comment-14680907
 ] 

Jian He commented on YARN-4041:
---

YARN-2010 was done to actually ignore the failure on token renewal for 
recovery. Now I agree that we do not even need to do the renew

 Slow delegation token renewal can severely prolong RM recovery
 --

 Key: YARN-4041
 URL: https://issues.apache.org/jira/browse/YARN-4041
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Jason Lowe

 When the RM does a work-preserving restart it synchronously tries to renew 
 delegation tokens for every active application.  If a token server happens to 
 be down or is running slow and a lot of the active apps were using tokens 
 from that server then it can have a huge impact on the time it takes the RM 
 to process the restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4041) Slow delegation token renewal can severely prolong RM recovery

2015-08-10 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14680904#comment-14680904
 ] 

Jian He commented on YARN-4041:
---

bq. I don't think the RM should blindly renew all tokens for apps that are 
already active and running on the cluster when it restarts.
I agree with this. We do not need to renew tokens for apps on recovery.  

 Slow delegation token renewal can severely prolong RM recovery
 --

 Key: YARN-4041
 URL: https://issues.apache.org/jira/browse/YARN-4041
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Jason Lowe

 When the RM does a work-preserving restart it synchronously tries to renew 
 delegation tokens for every active application.  If a token server happens to 
 be down or is running slow and a lot of the active apps were using tokens 
 from that server then it can have a huge impact on the time it takes the RM 
 to process the restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4041) Slow delegation token renewal can severely prolong RM recovery

2015-08-10 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14681217#comment-14681217
 ] 

Rohith Sharma K S commented on YARN-4041:
-

Recently in test cluster faced the similar issue i.e around 60 apps were 
running. On RM switch, each applications took around 8 minutes to renew 
delegation token which is 8 min* 60 apps = 480minutes for recovery. YARN-3639 
is the issue raised for the same.

 Slow delegation token renewal can severely prolong RM recovery
 --

 Key: YARN-4041
 URL: https://issues.apache.org/jira/browse/YARN-4041
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Sunil G

 When the RM does a work-preserving restart it synchronously tries to renew 
 delegation tokens for every active application.  If a token server happens to 
 be down or is running slow and a lot of the active apps were using tokens 
 from that server then it can have a huge impact on the time it takes the RM 
 to process the restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4041) Slow delegation token renewal can severely prolong RM recovery

2015-08-10 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14680730#comment-14680730
 ] 

Jason Lowe commented on YARN-4041:
--

Maybe.  The synchronous recovery was added as part of YARN-2010, and I don't 
recall from that JIRA why it was crucial for the token renewal process to be 
performed synchronously during recovery.  [~jianhe] or [~kasha] do you see any 
issues with making the delegation token renewal asynchronous for active 
applications during RM recovery?

 Slow delegation token renewal can severely prolong RM recovery
 --

 Key: YARN-4041
 URL: https://issues.apache.org/jira/browse/YARN-4041
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Jason Lowe

 When the RM does a work-preserving restart it synchronously tries to renew 
 delegation tokens for every active application.  If a token server happens to 
 be down or is running slow and a lot of the active apps were using tokens 
 from that server then it can have a huge impact on the time it takes the RM 
 to process the restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4041) Slow delegation token renewal can severely prolong RM recovery

2015-08-10 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14681048#comment-14681048
 ] 

Sunil G commented on YARN-4041:
---

I wud like to take this. [~jlowe] cud I take over this. 

 Slow delegation token renewal can severely prolong RM recovery
 --

 Key: YARN-4041
 URL: https://issues.apache.org/jira/browse/YARN-4041
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Jason Lowe

 When the RM does a work-preserving restart it synchronously tries to renew 
 delegation tokens for every active application.  If a token server happens to 
 be down or is running slow and a lot of the active apps were using tokens 
 from that server then it can have a huge impact on the time it takes the RM 
 to process the restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)