subject:"\[jira\] \[Commented\] \(YARN\-2964\) RM prematurely cancels tokens for jobs that submit jobs \(oozie\)"

[
https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14253229#comment-14253229
]

Hudson commented on YARN-2964:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #46 (See
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/46/])
YARN-2964. RM prematurely cancels tokens for jobs that submit jobs (oozie).
Contributed by Jian He (jlowe: rev 0402bada1989258ecbfdc437cb339322a1f55a97)
*
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
*
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java
* hadoop-yarn-project/CHANGES.txt
*
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestDelegationTokenRenewer.java

RM prematurely cancels tokens for jobs that submit jobs (oozie)
---

Attachments: YARN-2964.1.patch, YARN-2964.2.patch, YARN-2964.3.patch

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)

[
https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14253246#comment-14253246
]

Hudson commented on YARN-2964:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #780 (See
[https://builds.apache.org/job/Hadoop-Yarn-trunk/780/])
YARN-2964. RM prematurely cancels tokens for jobs that submit jobs (oozie).
Contributed by Jian He (jlowe: rev 0402bada1989258ecbfdc437cb339322a1f55a97)
*
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
* hadoop-yarn-project/CHANGES.txt
*
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java
*
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestDelegationTokenRenewer.java

RM prematurely cancels tokens for jobs that submit jobs (oozie)
---

Attachments: YARN-2964.1.patch, YARN-2964.2.patch, YARN-2964.3.patch

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)

[
https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14253440#comment-14253440
]

Hudson commented on YARN-2964:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1978 (See
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1978/])
YARN-2964. RM prematurely cancels tokens for jobs that submit jobs (oozie).
Contributed by Jian He (jlowe: rev 0402bada1989258ecbfdc437cb339322a1f55a97)
*
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
*
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java
* hadoop-yarn-project/CHANGES.txt
*
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestDelegationTokenRenewer.java

RM prematurely cancels tokens for jobs that submit jobs (oozie)
---

Attachments: YARN-2964.1.patch, YARN-2964.2.patch, YARN-2964.3.patch

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)

[
https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14253455#comment-14253455
]

Hudson commented on YARN-2964:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #43 (See
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/43/])
YARN-2964. RM prematurely cancels tokens for jobs that submit jobs (oozie).
Contributed by Jian He (jlowe: rev 0402bada1989258ecbfdc437cb339322a1f55a97)
*
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java
*
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
*
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestDelegationTokenRenewer.java
* hadoop-yarn-project/CHANGES.txt

RM prematurely cancels tokens for jobs that submit jobs (oozie)
---

Attachments: YARN-2964.1.patch, YARN-2964.2.patch, YARN-2964.3.patch

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)

[
https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14253502#comment-14253502
]

Hudson commented on YARN-2964:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #47 (See
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/47/])
YARN-2964. RM prematurely cancels tokens for jobs that submit jobs (oozie).
Contributed by Jian He (jlowe: rev 0402bada1989258ecbfdc437cb339322a1f55a97)
*
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java
* hadoop-yarn-project/CHANGES.txt
*
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
*
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestDelegationTokenRenewer.java

RM prematurely cancels tokens for jobs that submit jobs (oozie)
---

Attachments: YARN-2964.1.patch, YARN-2964.2.patch, YARN-2964.3.patch

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)

[
https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14253523#comment-14253523
]

Hudson commented on YARN-2964:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1997 (See
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1997/])
YARN-2964. RM prematurely cancels tokens for jobs that submit jobs (oozie).
Contributed by Jian He (jlowe: rev 0402bada1989258ecbfdc437cb339322a1f55a97)
*
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
*
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java
* hadoop-yarn-project/CHANGES.txt
*
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestDelegationTokenRenewer.java

RM prematurely cancels tokens for jobs that submit jobs (oozie)
---

Attachments: YARN-2964.1.patch, YARN-2964.2.patch, YARN-2964.3.patch

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)

2014-12-19 Thread Jian He (JIRA)

[
https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14254095#comment-14254095
]

Jian He commented on YARN-2964:
---

bq. do you think this is something we can/should fix in YARN?
I think so. RM is the designated renewer so it should renew the token every so
often. But because there's a bug in DelegationTokenRenewer, RM just forgets the
token and won't renew the token automatically. So we should fix this in
DelegationTokenRenewer to keep track of the token and renew the token properly.

RM prematurely cancels tokens for jobs that submit jobs (oozie)
---

Attachments: YARN-2964.1.patch, YARN-2964.2.patch, YARN-2964.3.patch

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)

[
https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251466#comment-14251466
]

Hudson commented on YARN-2964:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #45 (See
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/45/])
YARN-2964. FSLeafQueue#assignContainer - document the reason for using both
write and read locks. (Tsuyoshi Ozawa via kasha) (kasha: rev
f2d150ea1205b77a75c347ace667b4cd060aaf40)
* hadoop-yarn-project/CHANGES.txt
*
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java

RM prematurely cancels tokens for jobs that submit jobs (oozie)
---

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)

[
https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251481#comment-14251481
]

Hudson commented on YARN-2964:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #779 (See
[https://builds.apache.org/job/Hadoop-Yarn-trunk/779/])
YARN-2964. FSLeafQueue#assignContainer - document the reason for using both
write and read locks. (Tsuyoshi Ozawa via kasha) (kasha: rev
f2d150ea1205b77a75c347ace667b4cd060aaf40)
*
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java
* hadoop-yarn-project/CHANGES.txt

RM prematurely cancels tokens for jobs that submit jobs (oozie)
---

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)

[
https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251708#comment-14251708
]

Hudson commented on YARN-2964:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #42 (See
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/42/])
YARN-2964. FSLeafQueue#assignContainer - document the reason for using both
write and read locks. (Tsuyoshi Ozawa via kasha) (kasha: rev
f2d150ea1205b77a75c347ace667b4cd060aaf40)
* hadoop-yarn-project/CHANGES.txt
*
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java

RM prematurely cancels tokens for jobs that submit jobs (oozie)
---

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)

[
https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251722#comment-14251722
]

Hudson commented on YARN-2964:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1977 (See
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1977/])
YARN-2964. FSLeafQueue#assignContainer - document the reason for using both
write and read locks. (Tsuyoshi Ozawa via kasha) (kasha: rev
f2d150ea1205b77a75c347ace667b4cd060aaf40)
*
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java
* hadoop-yarn-project/CHANGES.txt

RM prematurely cancels tokens for jobs that submit jobs (oozie)
---

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)

[
https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251766#comment-14251766
]

Hudson commented on YARN-2964:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #46 (See
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/46/])
YARN-2964. FSLeafQueue#assignContainer - document the reason for using both
write and read locks. (Tsuyoshi Ozawa via kasha) (kasha: rev
f2d150ea1205b77a75c347ace667b4cd060aaf40)
*
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java
* hadoop-yarn-project/CHANGES.txt

RM prematurely cancels tokens for jobs that submit jobs (oozie)
---

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)

[
https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251804#comment-14251804
]

Hudson commented on YARN-2964:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1996 (See
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1996/])
YARN-2964. FSLeafQueue#assignContainer - document the reason for using both
write and read locks. (Tsuyoshi Ozawa via kasha) (kasha: rev
f2d150ea1205b77a75c347ace667b4cd060aaf40)
* hadoop-yarn-project/CHANGES.txt
*
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java

RM prematurely cancels tokens for jobs that submit jobs (oozie)
---

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)

[
https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251818#comment-14251818
]

Jason Lowe commented on YARN-2964:
--

Thanks for the patch, Jian! Findbug warnings appear to be unrelated.

I'm wondering about the change in the removeApplicationFromRenewal method or
remove. If a sub-job completes, won't we remove the token from the allTokens
map before the launcher job has completed? Then a subsequent sub-job that
requests token cancelation can put the token back in the map and cause the
token to be canceled when it leaves. I think we need to repeat the logic from
the original code before YARN-2704 here, i.e.: only remove the token if the
application ID matches. That way the launcher job's token will remain _the_
token in that collection until the launcher job completes.

This comment doesn't match the code, since the code looks like if any token
wants to cancel at the end then we will cancel at the end.
{code}
// If any of the jobs sharing the same token set shouldCancelAtEnd
// to true, we should not cancel the token.
if (evt.shouldCancelAtEnd) {
dttr.shouldCancelAtEnd = evt.shouldCancelAtEnd;
}
{code}
I think the logic and comment should be if any job doesn't want to cancel then
we won't cancel. The code seems to be trying to do the opposite, so I'm not
sure how the unit test is passing. Maybe I'm missing something.

The info log message added in handleAppSubmitEvent also is misleading, as it
says we are setting shouldCancelAtEnd to whatever the event said, when in
reality we only set it sometimes. Probably needs to be inside the conditional.

Wonder if we should be using a Set instead of a Map to track these tokens.
Adding an already existing DelegationTokenToRenew in a set will not change the
one already there, but with the map a sub-job can clobber the
DelegationTokenToRenew that's already there with its own when it does the
allTokens.put(dtr.token, dtr).

RM prematurely cancels tokens for jobs that submit jobs (oozie)
---

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)

[
https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252045#comment-14252045
]

Jian He commented on YARN-2964:
---

thanks for your comments, Jason !

bq. I'm wondering about the change in the removeApplicationFromRenewal method
or remove.
If launcher job first gets added to the appTokens map, DelegationTokenRenewer
will not add DelegationTokenToRenew instance for the sub-job. So the tokens in
removeApplicationFromRenewal will return empty for the sub-job when the sub-job
completes. So the token won’t be removed from the allTokens. My only concern
with a global set that is that each time an application completes, we end up
looping all the applications or worse (each app may have at least one token).
bq. This comment doesn't match the code
good catch.. what a mistake.. I might be in the impression the semantics is
“shouldKeepAtEnd”, I added one line in the test case to guard against this.
bq. Wonder if we should be using a Set instead of a Map to track these tokens
Thought about that too, the reason that switched to a map is to get the
DelegationTokenToRenew instance based on the token app provided and change the
shouldCancelAtEnd field on submission.

RM prematurely cancels tokens for jobs that submit jobs (oozie)
---

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)

2014-12-18 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252216#comment-14252216
 ] 

Hadoop QA commented on YARN-2964:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12688092/YARN-2964.2.patch
  against trunk revision 07619aa.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 14 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.server.resourcemanager.TestRM
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService
  
org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6149//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6149//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6149//console

This message is automatically generated.

 RM prematurely cancels tokens for jobs that submit jobs (oozie)
 ---

 Key: YARN-2964
 URL: https://issues.apache.org/jira/browse/YARN-2964
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Daryn Sharp
Assignee: Jian He
Priority: Blocker
 Attachments: YARN-2964.1.patch, YARN-2964.2.patch


 The RM used to globally track the unique set of tokens for all apps.  It 
 remembered the first job that was submitted with the token.  The first job 
 controlled the cancellation of the token.  This prevented completion of 
 sub-jobs from canceling tokens used by the main job.
 As of YARN-2704, the RM now tracks tokens on a per-app basis.  There is no 
 notion of the first/main job.  This results in sub-jobs canceling tokens and 
 failing the main job and other sub-jobs.  It also appears to schedule 
 multiple redundant renewals.
 The issue is not immediately obvious because the RM will cancel tokens ~10 
 min (NM livelyness interval) after log aggregation completes.  The result is 
 an oozie job, ex. pig, that will launch many sub-jobs over time will fail if 
 any sub-jobs are launched 10 min after any sub-job completes.  If all other 
 sub-jobs complete within that 10 min window, then the issue goes unnoticed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)

[
https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252218#comment-14252218
]

Jason Lowe commented on YARN-2964:
--

bq. If launcher job first gets added to the appTokens map,
DelegationTokenRenewer will not add DelegationTokenToRenew instance for the
sub-job.

Ah, sorry, I missed this critical change from the original patch. However if
we don't add the delegation token for each sub-job then I think we have a
problem with the following use-case:

# Oozie launcher submits a MapReduce sub-job
# MapReduce job starts
# Oozie launcher job leaves
# MapReduce job now running with a token that the RM has forgotten and won't
be automatically renewed

We might have had the same issue in this case prior to YARN-2704, since the
token would be pulled from the set when the launcher completed.

RM prematurely cancels tokens for jobs that submit jobs (oozie)
---

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)

[
https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252243#comment-14252243
]

Jian He commented on YARN-2964:
---

bq. We might have had the same issue in this case prior to YARN-2704.
Yes, this is an existing issue. As Robert pointed out in the previous comment,
oozie MapReduce sub-job now cannot run beyond 24 hrs. IMO, we can fix this
separately ?

RM prematurely cancels tokens for jobs that submit jobs (oozie)
---

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)


[ 
https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252259#comment-14252259
 ] 

Jason Lowe commented on YARN-2964:
--

Sure, we can fix that as a followup issue since it's no worse than what we had 
before.

+1 lgtm, only nit is the new getAllTokens method should be package-private 
instead of public but not a big deal either way.  I assume the test failures 
are unrelated?

 RM prematurely cancels tokens for jobs that submit jobs (oozie)
 ---

 Key: YARN-2964
 URL: https://issues.apache.org/jira/browse/YARN-2964
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Daryn Sharp
Assignee: Jian He
Priority: Blocker
 Attachments: YARN-2964.1.patch, YARN-2964.2.patch


 The RM used to globally track the unique set of tokens for all apps.  It 
 remembered the first job that was submitted with the token.  The first job 
 controlled the cancellation of the token.  This prevented completion of 
 sub-jobs from canceling tokens used by the main job.
 As of YARN-2704, the RM now tracks tokens on a per-app basis.  There is no 
 notion of the first/main job.  This results in sub-jobs canceling tokens and 
 failing the main job and other sub-jobs.  It also appears to schedule 
 multiple redundant renewals.
 The issue is not immediately obvious because the RM will cancel tokens ~10 
 min (NM livelyness interval) after log aggregation completes.  The result is 
 an oozie job, ex. pig, that will launch many sub-jobs over time will fail if 
 any sub-jobs are launched 10 min after any sub-job completes.  If all other 
 sub-jobs complete within that 10 min window, then the issue goes unnoticed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)

[
https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252286#comment-14252286
]

Jian He commented on YARN-2964:
---

I believe the failures are not related. I just changed the visibility and
uploaded a new patch to re-kick jenkins.

RM prematurely cancels tokens for jobs that submit jobs (oozie)
---

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)

2014-12-18 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252473#comment-14252473
 ] 

Hadoop QA commented on YARN-2964:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12688133/YARN-2964.3.patch
  against trunk revision b9d4976.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 14 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart
  org.apache.hadoop.yarn.server.resourcemanager.TestRM

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6150//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6150//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6150//console

This message is automatically generated.

 RM prematurely cancels tokens for jobs that submit jobs (oozie)
 ---

 Key: YARN-2964
 URL: https://issues.apache.org/jira/browse/YARN-2964
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Daryn Sharp
Assignee: Jian He
Priority: Blocker
 Attachments: YARN-2964.1.patch, YARN-2964.2.patch, YARN-2964.3.patch


 The RM used to globally track the unique set of tokens for all apps.  It 
 remembered the first job that was submitted with the token.  The first job 
 controlled the cancellation of the token.  This prevented completion of 
 sub-jobs from canceling tokens used by the main job.
 As of YARN-2704, the RM now tracks tokens on a per-app basis.  There is no 
 notion of the first/main job.  This results in sub-jobs canceling tokens and 
 failing the main job and other sub-jobs.  It also appears to schedule 
 multiple redundant renewals.
 The issue is not immediately obvious because the RM will cancel tokens ~10 
 min (NM livelyness interval) after log aggregation completes.  The result is 
 an oozie job, ex. pig, that will launch many sub-jobs over time will fail if 
 any sub-jobs are launched 10 min after any sub-job completes.  If all other 
 sub-jobs complete within that 10 min window, then the issue goes unnoticed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)

[
https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252510#comment-14252510
]

Jason Lowe commented on YARN-2964:
--

+1 lgtm. I don't believe the test failures are related since they pass for me
locally. Committing this.

RM prematurely cancels tokens for jobs that submit jobs (oozie)
---

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)

[
https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252563#comment-14252563
]

Hudson commented on YARN-2964:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6755 (See
[https://builds.apache.org/job/Hadoop-trunk-Commit/6755/])
YARN-2964. RM prematurely cancels tokens for jobs that submit jobs (oozie).
Contributed by Jian He (jlowe: rev 0402bada1989258ecbfdc437cb339322a1f55a97)
* hadoop-yarn-project/CHANGES.txt
*
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
*
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java
*
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestDelegationTokenRenewer.java

RM prematurely cancels tokens for jobs that submit jobs (oozie)
---

Attachments: YARN-2964.1.patch, YARN-2964.2.patch, YARN-2964.3.patch

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)