date:20200121

[jira] [Commented] (YARN-9768) RM Renew Delegation token thread should timeout and retry

2020-01-21 Thread Hudson (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-9768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17020715#comment-17020715
 ] 

Hudson commented on YARN-9768:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17889 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/17889/])
Revert "YARN-9768. RM Renew Delegation token thread should timeout and 
(inigoiri: rev b4870bce3a8336dbd638d26b8662037c4d4cdae9)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestDelegationTokenRenewer.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml


> RM Renew Delegation token thread should timeout and retry
> -
>
> Key: YARN-9768
> URL: https://issues.apache.org/jira/browse/YARN-9768
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: CR Hota
>Assignee: Manikandan R
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9768.001.patch, YARN-9768.002.patch, 
> YARN-9768.003.patch, YARN-9768.004.patch, YARN-9768.005.patch, 
> YARN-9768.006.patch, YARN-9768.007.patch, YARN-9768.008.patch
>
>
> Delegation token renewer thread in RM (DelegationTokenRenewer.java) renews 
> HDFS tokens received to check for validity and expiration time.
> This call is made to an underlying HDFS NN or Router Node (which has exact 
> APIs as HDFS NN). If one of the nodes is bad and the renew call is stuck the 
> thread remains stuck indefinitely. The thread should ideally timeout the 
> renewToken and retry from the client's perspective.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9768) RM Renew Delegation token thread should timeout and retry

2020-01-21 Thread Mingliang Liu (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-9768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17020712#comment-17020712
 ] 

Mingliang Liu commented on YARN-9768:
-

Thanks [~elgoiri] for prompt reply and fix. The {{trunk}} branch is good to me 
now.

> RM Renew Delegation token thread should timeout and retry
> -
>
> Key: YARN-9768
> URL: https://issues.apache.org/jira/browse/YARN-9768
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: CR Hota
>Assignee: Manikandan R
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9768.001.patch, YARN-9768.002.patch, 
> YARN-9768.003.patch, YARN-9768.004.patch, YARN-9768.005.patch, 
> YARN-9768.006.patch, YARN-9768.007.patch, YARN-9768.008.patch
>
>
> Delegation token renewer thread in RM (DelegationTokenRenewer.java) renews 
> HDFS tokens received to check for validity and expiration time.
> This call is made to an underlying HDFS NN or Router Node (which has exact 
> APIs as HDFS NN). If one of the nodes is bad and the renew call is stuck the 
> thread remains stuck indefinitely. The thread should ideally timeout the 
> renewToken and retry from the client's perspective.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Reopened] (YARN-9768) RM Renew Delegation token thread should timeout and retry

2020-01-21 Thread Jira



 [ 
https://issues.apache.org/jira/browse/YARN-9768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri reopened YARN-9768:
---

> RM Renew Delegation token thread should timeout and retry
> -
>
> Key: YARN-9768
> URL: https://issues.apache.org/jira/browse/YARN-9768
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: CR Hota
>Assignee: Manikandan R
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9768.001.patch, YARN-9768.002.patch, 
> YARN-9768.003.patch, YARN-9768.004.patch, YARN-9768.005.patch, 
> YARN-9768.006.patch, YARN-9768.007.patch, YARN-9768.008.patch
>
>
> Delegation token renewer thread in RM (DelegationTokenRenewer.java) renews 
> HDFS tokens received to check for validity and expiration time.
> This call is made to an underlying HDFS NN or Router Node (which has exact 
> APIs as HDFS NN). If one of the nodes is bad and the renew call is stuck the 
> thread remains stuck indefinitely. The thread should ideally timeout the 
> renewToken and retry from the client's perspective.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9768) RM Renew Delegation token thread should timeout and retry

2020-01-21 Thread Jira



[ 
https://issues.apache.org/jira/browse/YARN-9768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17020710#comment-17020710
 ] 

Íñigo Goiri commented on YARN-9768:
---

YARN-9052 replaced submitApp last week.
Reverting [^YARN-9768.008.patch].
[~maniraj...@gmail.com] do you mind rebasing with the new MockRMAppSubmitter?

> RM Renew Delegation token thread should timeout and retry
> -
>
> Key: YARN-9768
> URL: https://issues.apache.org/jira/browse/YARN-9768
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: CR Hota
>Assignee: Manikandan R
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9768.001.patch, YARN-9768.002.patch, 
> YARN-9768.003.patch, YARN-9768.004.patch, YARN-9768.005.patch, 
> YARN-9768.006.patch, YARN-9768.007.patch, YARN-9768.008.patch
>
>
> Delegation token renewer thread in RM (DelegationTokenRenewer.java) renews 
> HDFS tokens received to check for validity and expiration time.
> This call is made to an underlying HDFS NN or Router Node (which has exact 
> APIs as HDFS NN). If one of the nodes is bad and the renew call is stuck the 
> thread remains stuck indefinitely. The thread should ideally timeout the 
> renewToken and retry from the client's perspective.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9768) RM Renew Delegation token thread should timeout and retry

2020-01-21 Thread Mingliang Liu (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-9768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17020698#comment-17020698
 ] 

Mingliang Liu commented on YARN-9768:
-

I saw errors like this on {{trunk}} branch:
{quote}
$ mvn clean package -DskipTests -q

Apache Maven 3.6.3 (cecedd343002696d0abb50b32b541b8a6ba2883f)
Maven home: /usr/local/Cellar/maven/3.6.3/libexec
Java version: 1.8.0_162, vendor: Oracle Corporation, runtime: 
/Library/Java/JavaVirtualMachines/jdk1.8.0_162.jdk/Contents/Home/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "mac os x", version: "10.15.2", arch: "x86_64", family: "mac"
[ERROR] COMPILATION ERROR :
[ERROR] 
/Users/mingliang.liu/Workspace/apache/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestDelegationTokenRenewer.java:[1675,7]
 cannot find symbol
  symbol:   method 
submitApp(int,java.lang.String,java.lang.String,java.util.HashMap,boolean,java.lang.String,int,org.apache.hadoop.security.Credentials)
  location: variable rm of type 
org.apache.hadoop.yarn.server.resourcemanager.MockRM
[ERROR] 
/Users/mingliang.liu/Workspace/apache/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestDelegationTokenRenewer.java:[1741,7]
 cannot find symbol
  symbol:   method 
submitApp(int,java.lang.String,java.lang.String,java.util.HashMap,boolean,java.lang.String,int,org.apache.hadoop.security.Credentials)
  location: variable rm of type 
org.apache.hadoop.yarn.server.resourcemanager.MockRM
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-compiler-plugin:3.1:testCompile 
(default-testCompile) on project hadoop-yarn-server-resourcemanager: 
Compilation failure: Compilation failure:
[ERROR] 
/Users/mingliang.liu/Workspace/apache/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestDelegationTokenRenewer.java:[1675,7]
 cannot find symbol
[ERROR]   symbol:   method 
submitApp(int,java.lang.String,java.lang.String,java.util.HashMap,boolean,java.lang.String,int,org.apache.hadoop.security.Credentials)
[ERROR]   location: variable rm of type 
org.apache.hadoop.yarn.server.resourcemanager.MockRM
[ERROR] 
/Users/mingliang.liu/Workspace/apache/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestDelegationTokenRenewer.java:[1741,7]
 cannot find symbol
[ERROR]   symbol:   method 
submitApp(int,java.lang.String,java.lang.String,java.util.HashMap,boolean,java.lang.String,int,org.apache.hadoop.security.Credentials)
[ERROR]   location: variable rm of type 
org.apache.hadoop.yarn.server.resourcemanager.MockRM
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn  -rf :hadoop-yarn-server-resourcemanager
{quote}

Is this broken by this commit? [~elgoiri] and [~maniraj...@gmail.com] Thanks

> RM Renew Delegation token thread should timeout and retry
> -
>
> Key: YARN-9768
> URL: https://issues.apache.org/jira/browse/YARN-9768
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: CR Hota
>Assignee: Manikandan R
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9768.001.patch, YARN-9768.002.patch, 
> YARN-9768.003.patch, YARN-9768.004.patch, YARN-9768.005.patch, 
> YARN-9768.006.patch, YARN-9768.007.patch, YARN-9768.008.patch
>
>
> Delegation token renewer thread in RM (DelegationTokenRenewer.java) renews 
> HDFS tokens received to check for validity and expiration time.
> This call is made to an underlying HDFS NN or Router Node (which has exact 
> APIs as HDFS NN). If one of the nodes is bad and the renew call is stuck the 
> thread remains stuck indefinitely. The thread should ideally timeout the 
> renewToken and retry from the client's perspective.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7913) Improve error handling when application recovery fails with exception

2020-01-21 Thread Wilfred Spiegelenburg (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-7913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17020662#comment-17020662
 ] 

Wilfred Spiegelenburg commented on YARN-7913:
-

Test failures are not related, branch-3.2 fix seems to be good to go.

You might also need to trigger the branch-3.1 build, the backport is slightly 
different because of YARN-8248

> Improve error handling when application recovery fails with exception
> -
>
> Key: YARN-7913
> URL: https://issues.apache.org/jira/browse/YARN-7913
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 3.0.0
>Reporter: Gergo Repas
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-7913-branch-3.1.001.patch, 
> YARN-7913-branch-3.2.001.patch, YARN-7913.000.poc.patch, YARN-7913.001.patch, 
> YARN-7913.002.patch, YARN-7913.003.patch
>
>
> There are edge cases when the application recovery fails with an exception.
> Example failure scenario:
>  * setup: a queue is a leaf queue in the primary RM's config and the same 
> queue is a parent queue in the secondary RM's config.
>  * When failover happens with this setup, the recovery will fail for 
> applications on this queue, and an APP_REJECTED event will be dispatched to 
> the async dispatcher. On the same thread (that handles the recovery), a 
> NullPointerException is thrown when the applicationAttempt is tried to be 
> recovered 
> (https://github.com/apache/hadoop/blob/55066cc53dc22b68f9ca55a0029741d6c846be0a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java#L494).
>  I don't see a good way to avoid the NPE in this scenario, because when the 
> NPE occurs the APP_REJECTED has not been processed yet, and we don't know 
> that the application recovery failed.
> Currently the first exception will abort the recovery, and if there are X 
> applications, there will be ~X passive -> active RM transition attempts - the 
> passive -> active RM transition will only succeed when the last APP_REJECTED 
> event is processed on the async dispatcher thread.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10084) Allow inheritance of max app lifetime / default app lifetime

2020-01-21 Thread Jim Brennan (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17020612#comment-17020612
 ] 

Jim Brennan commented on YARN-10084:


{quote}Should we be inheriting the parent queue's max if the leaf queue's max 
is zero?

I think so. I'll think about this some more, but a child queue should not have 
a max lifetime longer than its parent's max lifetime in my opinion.
{quote}
 

What if you want only one queue to have no max?   How would you configure that? 
 Would be nice if you could specify the max at the root once, and only specify 
zero on the long job queue to specify that it has no max.

 

> Allow inheritance of max app lifetime / default app lifetime
> 
>
> Key: YARN-10084
> URL: https://issues.apache.org/jira/browse/YARN-10084
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Affects Versions: 2.10.0, 3.2.1, 3.1.3
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Major
> Attachments: YARN-10084.001.patch
>
>
> Currently, {{maximum-application-lifetime}} and 
> {{default-application-lifetime}} must be set for each leaf queue. If it is 
> not set for a particular leaf queue, then there will be no time limit on apps 
> running in that queue. It should be possible to set 
> {{yarn.scheduler.capacity.root.maximum-application-lifetime}} for the root 
> queue and allow child queues to override that value if desired.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10084) Allow inheritance of max app lifetime / default app lifetime

2020-01-21 Thread Eric Payne (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17020605#comment-17020605
 ] 

Eric Payne commented on YARN-10084:
---

{quote}
For some reason, I was under the impression that < 0 meant unlimited, but the 
documentation clearly says " Any value less than or equal to zero will be 
considered as disabled."
{quote}
Sorry, spoke too soon. As you pointed out in your second comment, this specific 
code was unchanged, it was just moved from LeafQueue to AbstractCSQueue. Also, 
it should be checking for > 0 and not >= 0, because  <= 0 means no max lifetime.

bq. Should we be inheriting the parent queue's max if the leaf queue's max is 
zero? 
I think so. I'll think about this some more, but a child queue should not have 
a max lifetime longer than its parent's max lifetime in my opinion.

> Allow inheritance of max app lifetime / default app lifetime
> 
>
> Key: YARN-10084
> URL: https://issues.apache.org/jira/browse/YARN-10084
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Affects Versions: 2.10.0, 3.2.1, 3.1.3
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Major
> Attachments: YARN-10084.001.patch
>
>
> Currently, {{maximum-application-lifetime}} and 
> {{default-application-lifetime}} must be set for each leaf queue. If it is 
> not set for a particular leaf queue, then there will be no time limit on apps 
> running in that queue. It should be possible to set 
> {{yarn.scheduler.capacity.root.maximum-application-lifetime}} for the root 
> queue and allow child queues to override that value if desired.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10084) Allow inheritance of max app lifetime / default app lifetime

2020-01-21 Thread Eric Payne (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17020602#comment-17020602
 ] 

Eric Payne commented on YARN-10084:
---

bq. should you be checking for >= 0 instead of >0
Good catch, [~Jim_Brennan], and thanks for the review! Yes, I should be 
checking for >= 0. For some reason, I was under the impression that < 0 meant 
unlimited, but the documentation clearly says " Any value less than or equal to 
zero will be considered as disabled."

Also, this reminds me that I need to update the docs to state that the value 
can be set at any level in the queue hierarchy and overridden for specific leaf 
queues.

> Allow inheritance of max app lifetime / default app lifetime
> 
>
> Key: YARN-10084
> URL: https://issues.apache.org/jira/browse/YARN-10084
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Affects Versions: 2.10.0, 3.2.1, 3.1.3
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Major
> Attachments: YARN-10084.001.patch
>
>
> Currently, {{maximum-application-lifetime}} and 
> {{default-application-lifetime}} must be set for each leaf queue. If it is 
> not set for a particular leaf queue, then there will be no time limit on apps 
> running in that queue. It should be possible to set 
> {{yarn.scheduler.capacity.root.maximum-application-lifetime}} for the root 
> queue and allow child queues to override that value if desired.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9768) RM Renew Delegation token thread should timeout and retry

2020-01-21 Thread Hudson (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-9768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17020596#comment-17020596
 ] 

Hudson commented on YARN-9768:
--

FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #17886 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/17886/])
YARN-9768. RM Renew Delegation token thread should timeout and retry. 
(inigoiri: rev 0696828a090bc06446f75b29c967697f1d6d845b)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestDelegationTokenRenewer.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java


> RM Renew Delegation token thread should timeout and retry
> -
>
> Key: YARN-9768
> URL: https://issues.apache.org/jira/browse/YARN-9768
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: CR Hota
>Assignee: Manikandan R
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9768.001.patch, YARN-9768.002.patch, 
> YARN-9768.003.patch, YARN-9768.004.patch, YARN-9768.005.patch, 
> YARN-9768.006.patch, YARN-9768.007.patch, YARN-9768.008.patch
>
>
> Delegation token renewer thread in RM (DelegationTokenRenewer.java) renews 
> HDFS tokens received to check for validity and expiration time.
> This call is made to an underlying HDFS NN or Router Node (which has exact 
> APIs as HDFS NN). If one of the nodes is bad and the renew call is stuck the 
> thread remains stuck indefinitely. The thread should ideally timeout the 
> renewToken and retry from the client's perspective.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9768) RM Renew Delegation token thread should timeout and retry

2020-01-21 Thread Jira



[ 
https://issues.apache.org/jira/browse/YARN-9768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17020585#comment-17020585
 ] 

Íñigo Goiri commented on YARN-9768:
---

I think  [^YARN-9768.008.patch] covers the comment that [~bibinchundatt] had.
Thanks [~maniraj...@gmail.com] for the patch.
Committed to trunk.

> RM Renew Delegation token thread should timeout and retry
> -
>
> Key: YARN-9768
> URL: https://issues.apache.org/jira/browse/YARN-9768
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: CR Hota
>Assignee: Manikandan R
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9768.001.patch, YARN-9768.002.patch, 
> YARN-9768.003.patch, YARN-9768.004.patch, YARN-9768.005.patch, 
> YARN-9768.006.patch, YARN-9768.007.patch, YARN-9768.008.patch
>
>
> Delegation token renewer thread in RM (DelegationTokenRenewer.java) renews 
> HDFS tokens received to check for validity and expiration time.
> This call is made to an underlying HDFS NN or Router Node (which has exact 
> APIs as HDFS NN). If one of the nodes is bad and the renew call is stuck the 
> thread remains stuck indefinitely. The thread should ideally timeout the 
> renewToken and retry from the client's perspective.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10085) FS-CS converter: remove mixed ordering policy check

2020-01-21 Thread Peter Bacsko (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-10085:

Attachment: YARN-10085-004.patch

> FS-CS converter: remove mixed ordering policy check
> ---
>
> Key: YARN-10085
> URL: https://issues.apache.org/jira/browse/YARN-10085
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Critical
> Attachments: YARN-10085-001.patch, YARN-10085-002.patch, 
> YARN-10085-003.patch, YARN-10085-004.patch, YARN-10085-004.patch
>
>
> In the converter, this part is very strict and probably unnecessary:
> {noformat}
> // Validate ordering policy
> if (queueConverter.isDrfPolicyUsedOnQueueLevel()) {
>   if (queueConverter.isFifoOrFairSharePolicyUsed()) {
> throw new ConversionException(
> "DRF ordering policy cannot be used together with fifo/fair");
>   } else {
> capacitySchedulerConfig.set(
> CapacitySchedulerConfiguration.RESOURCE_CALCULATOR_CLASS,
> DominantResourceCalculator.class.getCanonicalName());
>   }
> }
> {noformat}
> It's also misleading, because Fair policy can be used under DRF, so the error 
> message is incorrect.
> Let's remove these checks and rewrite the converter in a way that it 
> generates a valid config even if fair/drf is somehow mixed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9879) Allow multiple leaf queues with the same name in CS

2020-01-21 Thread Wangda Tan (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-9879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17020522#comment-17020522
 ] 

Wangda Tan commented on YARN-9879:
--

[~shuzirra], I think we should not change semantics of GetQueueName of 
AbstractCSQueue to avoid the change of API. (We should keep REST API related to 
queues unchanged otherwise it will be an incompatible change).

Instead of changing GetQueueName, you should check all callers of the 
GetQueueName first. And there's already a GetQueuePath, you can leverage that.

I briefly checked GetQueueName usages, there're 155 of them in production code. 
Most of them are just for logging purposes 
("org.apache.hadoop.yarn.server.resourcemanager.scheduler.activities.*" should 
be considered as logging also). It may take a few hours to identify and change 
everything, but manually change GetQueueName to GetQueuePath case-by-case 
sounds like a safer option to me.

> Allow multiple leaf queues with the same name in CS
> ---
>
> Key: YARN-9879
> URL: https://issues.apache.org/jira/browse/YARN-9879
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Gergely Pollak
>Assignee: Gergely Pollak
>Priority: Major
> Attachments: DesignDoc_v1.pdf, YARN-9879.POC001.patch
>
>
> Currently the leaf queue's name must be unique regardless of its position in 
> the queue hierarchy. 
> Design doc and first proposal is being made, I'll attach it as soon as it's 
> done.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Assigned] (YARN-9597) Memory efficiency in speculator

2020-01-21 Thread Ahmed Hussein (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-9597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein reassigned YARN-9597:
---

Assignee: Ahmed Hussein

> Memory efficiency in speculator 
> 
>
> Key: YARN-9597
> URL: https://issues.apache.org/jira/browse/YARN-9597
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Minor
>
> The data structures in speculator and runtime-estimator are bloating. Data 
> elements such as (taskID, TA-ID, task stats, tasks speculated, tasks 
> finished..etc) are added to the concurrent maps but never removed.
> For long running jobs, there are couple of issues:
>  # memory leakage: the speculator memory usage increases over time. 
>  # performance: keeping large structures in the heap affects the performance 
> due to locality and cache misses.
> *Suggested Fixes:*
> - When a TA transitions to {{MoveContainerToSucceededFinishingTransition}}, 
> the TA notifies the speculator. The latter handles the event by cleaning the 
> internal structure accordingly.
> - When a task transitions is failed/killed, the speculator is notified to 
> clean the internal data structure.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7913) Improve error handling when application recovery fails with exception

2020-01-21 Thread Hadoop QA (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-7913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17020444#comment-17020444
 ] 

Hadoop QA commented on YARN-7913:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
42s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} branch-3.2 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 
23s{color} | {color:green} branch-3.2 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
39s{color} | {color:green} branch-3.2 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
38s{color} | {color:green} branch-3.2 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
43s{color} | {color:green} branch-3.2 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 50s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
13s{color} | {color:green} branch-3.2 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
33s{color} | {color:green} branch-3.2 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 29s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 82m 12s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
30s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}143m 53s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.metrics.TestSystemMetricsPublisherForV2 |
|   | 
hadoop.yarn.server.resourcemanager.metrics.TestCombinedSystemMetricsPublisher |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:0f25cbbb251 |
| JIRA Issue | YARN-7913 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12991444/YARN-7913-branch-3.2.001.patch
 |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux d2773ddcb8af 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 
08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | branch-3.2 / c36fbcb |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_232 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/25413/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/25413/testReport/ |
| Max. process+thread count | 793 (vs. ulimit o

[jira] [Commented] (YARN-10084) Allow inheritance of max app lifetime / default app lifetime

2020-01-21 Thread Jim Brennan (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17020435#comment-17020435
 ] 

Jim Brennan commented on YARN-10084:


(responding to my own comment)

Actually, I think the question is really about getInheritedMaxAppLifetime().  
(The other code is unchanged from before.)  Should we be inheriting the parent 
queue's max if the leaf queue's max is zero?  Current patch keeps the leaf 
queue max at zero in this case.   Maybe this is correct, as the way to 
explicitly set no max in a leaf where the parent has one?

 

 

 

> Allow inheritance of max app lifetime / default app lifetime
> 
>
> Key: YARN-10084
> URL: https://issues.apache.org/jira/browse/YARN-10084
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Affects Versions: 2.10.0, 3.2.1, 3.1.3
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Major
> Attachments: YARN-10084.001.patch
>
>
> Currently, {{maximum-application-lifetime}} and 
> {{default-application-lifetime}} must be set for each leaf queue. If it is 
> not set for a particular leaf queue, then there will be no time limit on apps 
> running in that queue. It should be possible to set 
> {{yarn.scheduler.capacity.root.maximum-application-lifetime}} for the root 
> queue and allow child queues to override that value if desired.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10084) Allow inheritance of max app lifetime / default app lifetime

2020-01-21 Thread Jim Brennan (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17020416#comment-17020416
 ] 

Jim Brennan commented on YARN-10084:


Thanks for the patch [~epayne]!  I've built and run the tests and overall this 
looks good to me.

One question though: in setupQueueConfigs, should you be checking for {{>= 0}} 
instead of {{>0}} in these conditionals?
{noformat}
  if (maxApplicationLifetime > 0 &&
  defaultApplicationLifetime > maxApplicationLifetime) {
throw new YarnRuntimeException(
"Default lifetime " + defaultApplicationLifetime
+ " can't exceed maximum lifetime " + maxApplicationLifetime);
  }
  defaultApplicationLifetime = defaultApplicationLifetime > 0
  ? defaultApplicationLifetime : maxApplicationLifetime;
{noformat}
I'm not sure what zero means exactly - seems like it should disable it, but 
invalid is -1, and in getInheritedMaxAppLifetime we use {{maxAppLifetime >= 
0}}.  Meanwhile in checkAndGetApplicaitonLifetime we treat both -1 and 0 as 
invalid.


> Allow inheritance of max app lifetime / default app lifetime
> 
>
> Key: YARN-10084
> URL: https://issues.apache.org/jira/browse/YARN-10084
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Affects Versions: 2.10.0, 3.2.1, 3.1.3
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Major
> Attachments: YARN-10084.001.patch
>
>
> Currently, {{maximum-application-lifetime}} and 
> {{default-application-lifetime}} must be set for each leaf queue. If it is 
> not set for a particular leaf queue, then there will be no time limit on apps 
> running in that queue. It should be possible to set 
> {{yarn.scheduler.capacity.root.maximum-application-lifetime}} for the root 
> queue and allow child queues to override that value if desired.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10085) FS-CS converter: remove mixed ordering policy check

2020-01-21 Thread Peter Bacsko (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17020415#comment-17020415
 ] 

Peter Bacsko commented on YARN-10085:
-

Uploaded patch v4 to fix unit test failure.

> FS-CS converter: remove mixed ordering policy check
> ---
>
> Key: YARN-10085
> URL: https://issues.apache.org/jira/browse/YARN-10085
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Critical
> Attachments: YARN-10085-001.patch, YARN-10085-002.patch, 
> YARN-10085-003.patch, YARN-10085-004.patch
>
>
> In the converter, this part is very strict and probably unnecessary:
> {noformat}
> // Validate ordering policy
> if (queueConverter.isDrfPolicyUsedOnQueueLevel()) {
>   if (queueConverter.isFifoOrFairSharePolicyUsed()) {
> throw new ConversionException(
> "DRF ordering policy cannot be used together with fifo/fair");
>   } else {
> capacitySchedulerConfig.set(
> CapacitySchedulerConfiguration.RESOURCE_CALCULATOR_CLASS,
> DominantResourceCalculator.class.getCanonicalName());
>   }
> }
> {noformat}
> It's also misleading, because Fair policy can be used under DRF, so the error 
> message is incorrect.
> Let's remove these checks and rewrite the converter in a way that it 
> generates a valid config even if fair/drf is somehow mixed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10085) FS-CS converter: remove mixed ordering policy check

2020-01-21 Thread Peter Bacsko (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-10085:

Attachment: YARN-10085-004.patch

> FS-CS converter: remove mixed ordering policy check
> ---
>
> Key: YARN-10085
> URL: https://issues.apache.org/jira/browse/YARN-10085
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Critical
> Attachments: YARN-10085-001.patch, YARN-10085-002.patch, 
> YARN-10085-003.patch, YARN-10085-004.patch
>
>
> In the converter, this part is very strict and probably unnecessary:
> {noformat}
> // Validate ordering policy
> if (queueConverter.isDrfPolicyUsedOnQueueLevel()) {
>   if (queueConverter.isFifoOrFairSharePolicyUsed()) {
> throw new ConversionException(
> "DRF ordering policy cannot be used together with fifo/fair");
>   } else {
> capacitySchedulerConfig.set(
> CapacitySchedulerConfiguration.RESOURCE_CALCULATOR_CLASS,
> DominantResourceCalculator.class.getCanonicalName());
>   }
> }
> {noformat}
> It's also misleading, because Fair policy can be used under DRF, so the error 
> message is incorrect.
> Let's remove these checks and rewrite the converter in a way that it 
> generates a valid config even if fair/drf is somehow mixed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10085) FS-CS converter: remove mixed ordering policy check

2020-01-21 Thread Hadoop QA (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17020361#comment-17020361
 ] 

Hadoop QA commented on YARN-10085:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
35s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 
42s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
42s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 13s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
29s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 28s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 1s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
2s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 38s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 86m 13s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}144m 30s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.scheduler.fair.converter.TestFSConfigToCSConfigConverter
 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:c44943d1fc3 |
| JIRA Issue | YARN-10085 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12991428/YARN-10085-003.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  xml  |
| uname | Linux 5407c90af795 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 
08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / d887e49 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_232 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/25412/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoo

[jira] [Commented] (YARN-9879) Allow multiple leaf queues with the same name in CS

2020-01-21 Thread Gergely Pollak (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-9879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17020351#comment-17020351
 ] 

Gergely Pollak commented on YARN-9879:
--

Uploaded the firs POC patch. This version CAN run jobs by their short AND their 
full name, currently the leaf uniqueness constraint is in place, but the data 
store it prepared for leaf collisions. Currently working on testing and if 
nothing seems to be broken, I can remove the uniqueness constraint from leaf 
queues. Also I can add the feature control flag, as discussed earlier.

The biggest risk I see is the modification in AbstractCSQueue where I've 
changed the getQueueName to return the full name of the queue. Based on my 
analysis this method is mostly called to get the queue's string identifier, and 
as it was mentioned earlier we should use full queue names for queue 
identification. So I need to carefully check every call of this method if it 
breaks anything. But a quick smoke test was successful, I was able to start a 
job using only the leaf queue name as well as using the full queue name.

> Allow multiple leaf queues with the same name in CS
> ---
>
> Key: YARN-9879
> URL: https://issues.apache.org/jira/browse/YARN-9879
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Gergely Pollak
>Assignee: Gergely Pollak
>Priority: Major
> Attachments: DesignDoc_v1.pdf, YARN-9879.POC001.patch
>
>
> Currently the leaf queue's name must be unique regardless of its position in 
> the queue hierarchy. 
> Design doc and first proposal is being made, I'll attach it as soon as it's 
> done.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9879) Allow multiple leaf queues with the same name in CS

2020-01-21 Thread Gergely Pollak (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-9879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gergely Pollak updated YARN-9879:
-
Attachment: YARN-9879.POC001.patch

> Allow multiple leaf queues with the same name in CS
> ---
>
> Key: YARN-9879
> URL: https://issues.apache.org/jira/browse/YARN-9879
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Gergely Pollak
>Assignee: Gergely Pollak
>Priority: Major
> Attachments: DesignDoc_v1.pdf, YARN-9879.POC001.patch
>
>
> Currently the leaf queue's name must be unique regardless of its position in 
> the queue hierarchy. 
> Design doc and first proposal is being made, I'll attach it as soon as it's 
> done.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7913) Improve error handling when application recovery fails with exception

2020-01-21 Thread Szilard Nemeth (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-7913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17020316#comment-17020316
 ] 

Szilard Nemeth commented on YARN-7913:
--

Thanks [~wilfreds],
Triggered build as precommit job did not pick the patch up.

> Improve error handling when application recovery fails with exception
> -
>
> Key: YARN-7913
> URL: https://issues.apache.org/jira/browse/YARN-7913
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 3.0.0
>Reporter: Gergo Repas
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-7913-branch-3.1.001.patch, 
> YARN-7913-branch-3.2.001.patch, YARN-7913.000.poc.patch, YARN-7913.001.patch, 
> YARN-7913.002.patch, YARN-7913.003.patch
>
>
> There are edge cases when the application recovery fails with an exception.
> Example failure scenario:
>  * setup: a queue is a leaf queue in the primary RM's config and the same 
> queue is a parent queue in the secondary RM's config.
>  * When failover happens with this setup, the recovery will fail for 
> applications on this queue, and an APP_REJECTED event will be dispatched to 
> the async dispatcher. On the same thread (that handles the recovery), a 
> NullPointerException is thrown when the applicationAttempt is tried to be 
> recovered 
> (https://github.com/apache/hadoop/blob/55066cc53dc22b68f9ca55a0029741d6c846be0a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java#L494).
>  I don't see a good way to avoid the NPE in this scenario, because when the 
> NPE occurs the APP_REJECTED has not been processed yet, and we don't know 
> that the application recovery failed.
> Currently the first exception will abort the recovery, and if there are X 
> applications, there will be ~X passive -> active RM transition attempts - the 
> passive -> active RM transition will only succeed when the last APP_REJECTED 
> event is processed on the async dispatcher thread.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7913) Improve error handling when application recovery fails with exception

2020-01-21 Thread Wilfred Spiegelenburg (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-7913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YARN-7913:

Attachment: YARN-7913-branch-3.2.001.patch
YARN-7913-branch-3.1.001.patch

> Improve error handling when application recovery fails with exception
> -
>
> Key: YARN-7913
> URL: https://issues.apache.org/jira/browse/YARN-7913
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 3.0.0
>Reporter: Gergo Repas
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-7913-branch-3.1.001.patch, 
> YARN-7913-branch-3.2.001.patch, YARN-7913.000.poc.patch, YARN-7913.001.patch, 
> YARN-7913.002.patch, YARN-7913.003.patch
>
>
> There are edge cases when the application recovery fails with an exception.
> Example failure scenario:
>  * setup: a queue is a leaf queue in the primary RM's config and the same 
> queue is a parent queue in the secondary RM's config.
>  * When failover happens with this setup, the recovery will fail for 
> applications on this queue, and an APP_REJECTED event will be dispatched to 
> the async dispatcher. On the same thread (that handles the recovery), a 
> NullPointerException is thrown when the applicationAttempt is tried to be 
> recovered 
> (https://github.com/apache/hadoop/blob/55066cc53dc22b68f9ca55a0029741d6c846be0a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java#L494).
>  I don't see a good way to avoid the NPE in this scenario, because when the 
> NPE occurs the APP_REJECTED has not been processed yet, and we don't know 
> that the application recovery failed.
> Currently the first exception will abort the recovery, and if there are X 
> applications, there will be ~X passive -> active RM transition attempts - the 
> passive -> active RM transition will only succeed when the last APP_REJECTED 
> event is processed on the async dispatcher thread.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7913) Improve error handling when application recovery fails with exception

2020-01-21 Thread Wilfred Spiegelenburg (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-7913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17020242#comment-17020242
 ] 

Wilfred Spiegelenburg commented on YARN-7913:
-

added branch-3.1 and branch-3.2 patches

> Improve error handling when application recovery fails with exception
> -
>
> Key: YARN-7913
> URL: https://issues.apache.org/jira/browse/YARN-7913
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 3.0.0
>Reporter: Gergo Repas
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-7913-branch-3.1.001.patch, 
> YARN-7913-branch-3.2.001.patch, YARN-7913.000.poc.patch, YARN-7913.001.patch, 
> YARN-7913.002.patch, YARN-7913.003.patch
>
>
> There are edge cases when the application recovery fails with an exception.
> Example failure scenario:
>  * setup: a queue is a leaf queue in the primary RM's config and the same 
> queue is a parent queue in the secondary RM's config.
>  * When failover happens with this setup, the recovery will fail for 
> applications on this queue, and an APP_REJECTED event will be dispatched to 
> the async dispatcher. On the same thread (that handles the recovery), a 
> NullPointerException is thrown when the applicationAttempt is tried to be 
> recovered 
> (https://github.com/apache/hadoop/blob/55066cc53dc22b68f9ca55a0029741d6c846be0a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java#L494).
>  I don't see a good way to avoid the NPE in this scenario, because when the 
> NPE occurs the APP_REJECTED has not been processed yet, and we don't know 
> that the application recovery failed.
> Currently the first exception will abort the recovery, and if there are X 
> applications, there will be ~X passive -> active RM transition attempts - the 
> passive -> active RM transition will only succeed when the last APP_REJECTED 
> event is processed on the async dispatcher thread.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9605) Add ZkConfiguredFailoverProxyProvider for RM HA

2020-01-21 Thread Hadoop QA (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-9605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17020210#comment-17020210
 ] 

Hadoop QA commented on YARN-9605:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m  
2s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
24s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 20m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
 1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  4m 
42s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
23m 16s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  7m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m 
59s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
26s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  3m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 19m 
56s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} cc {color} | {color:red} 19m 56s{color} | 
{color:red} root generated 4 new + 22 unchanged - 4 fixed = 26 total (was 26) 
{color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 19m 
56s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
2m 39s{color} | {color:orange} root: The patch generated 4 new + 21 unchanged - 
0 fixed = 25 total (was 21) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 58s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  6m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m 
25s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  9m  
4s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
54s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
49s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 86m 
14s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
43s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}238m 50s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:c44943d1fc3 |
| JIRA Issue | YARN-9605 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12985805/YARN-9605.006.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit

[jira] [Commented] (YARN-10085) FS-CS converter: remove mixed ordering policy check

2020-01-21 Thread Peter Bacsko (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17020141#comment-17020141
 ] 

Peter Bacsko commented on YARN-10085:
-

Thanks for the comments [~adam.antal].

_"Could you please give some explanation why we have to explicitly set the 
resource calculator class in FSYarnSiteConverter#convertSiteProperties?"_
 I looked into CS initialization and it uses {{DefaultResourceCalculator}} as 
default, so I removed the else branch.

_"As a minor nit, I'd add a simple optimization in 
FSConfigToCSConfigConverter#isDrfUsed:"_
 Changed the order of calls. However, I didn't want to touch the recursive 
check, don't think it's not worth doing it (walking a tree even with 3000 nodes 
sould take microseconds).

> FS-CS converter: remove mixed ordering policy check
> ---
>
> Key: YARN-10085
> URL: https://issues.apache.org/jira/browse/YARN-10085
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Critical
> Attachments: YARN-10085-001.patch, YARN-10085-002.patch, 
> YARN-10085-003.patch
>
>
> In the converter, this part is very strict and probably unnecessary:
> {noformat}
> // Validate ordering policy
> if (queueConverter.isDrfPolicyUsedOnQueueLevel()) {
>   if (queueConverter.isFifoOrFairSharePolicyUsed()) {
> throw new ConversionException(
> "DRF ordering policy cannot be used together with fifo/fair");
>   } else {
> capacitySchedulerConfig.set(
> CapacitySchedulerConfiguration.RESOURCE_CALCULATOR_CLASS,
> DominantResourceCalculator.class.getCanonicalName());
>   }
> }
> {noformat}
> It's also misleading, because Fair policy can be used under DRF, so the error 
> message is incorrect.
> Let's remove these checks and rewrite the converter in a way that it 
> generates a valid config even if fair/drf is somehow mixed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10085) FS-CS converter: remove mixed ordering policy check

2020-01-21 Thread Peter Bacsko (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-10085:

Attachment: YARN-10085-003.patch

> FS-CS converter: remove mixed ordering policy check
> ---
>
> Key: YARN-10085
> URL: https://issues.apache.org/jira/browse/YARN-10085
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Critical
> Attachments: YARN-10085-001.patch, YARN-10085-002.patch, 
> YARN-10085-003.patch
>
>
> In the converter, this part is very strict and probably unnecessary:
> {noformat}
> // Validate ordering policy
> if (queueConverter.isDrfPolicyUsedOnQueueLevel()) {
>   if (queueConverter.isFifoOrFairSharePolicyUsed()) {
> throw new ConversionException(
> "DRF ordering policy cannot be used together with fifo/fair");
>   } else {
> capacitySchedulerConfig.set(
> CapacitySchedulerConfiguration.RESOURCE_CALCULATOR_CLASS,
> DominantResourceCalculator.class.getCanonicalName());
>   }
> }
> {noformat}
> It's also misleading, because Fair policy can be used under DRF, so the error 
> message is incorrect.
> Let's remove these checks and rewrite the converter in a way that it 
> generates a valid config even if fair/drf is somehow mixed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9605) Add ZkConfiguredFailoverProxyProvider for RM HA

2020-01-21 Thread Zhankun Tang (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-9605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17020046#comment-17020046
 ] 

Zhankun Tang commented on YARN-9605:


[~cane], let me trigger again. Yeah. It seems the cc WARNING is not related.

> Add ZkConfiguredFailoverProxyProvider for RM HA
> ---
>
> Key: YARN-9605
> URL: https://issues.apache.org/jira/browse/YARN-9605
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: zhoukang
>Assignee: zhoukang
>Priority: Major
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-9605.001.patch, YARN-9605.002.patch, 
> YARN-9605.003.patch, YARN-9605.004.patch, YARN-9605.005.patch, 
> YARN-9605.006.patch
>
>
> In this issue, i will track a new feature to support 
> ZkConfiguredFailoverProxyProvider for RM HA



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10085) FS-CS converter: remove mixed ordering policy check

2020-01-21 Thread Adam Antal (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17020041#comment-17020041
 ] 

Adam Antal commented on YARN-10085:
---

Thanks for the patch [~pbacsko]!

It seems logically correct to me. 

A question to have bigger context for me:
Could you please give some explanation why we have to explicitly set the 
resource calculator class in {{FSYarnSiteConverter#convertSiteProperties}}? I 
mean, if DRF is used: yes, it makes sense, but we could technically leave it 
empty is {{drfUsed}} is false, couldn't we?

As a minor nit, I'd add a simple optimization in 
{{FSConfigToCSConfigConverter#isDrfUsed}}:
If we have DRF in some leaf queue level we have to climb the whole queue 
hierarchy just to find out that some leaf queue has that policy - so in some 
cases it seems to be an expensive operation. Moreover, in {{#isDrfUsed}} we 
might return true anyways if this line is true:
{code:java}
return DominantResourceFairnessPolicy.NAME.equals(defaultPolicy)

{code}
So I assume we can do some minor performance improvement in that function by 
reorganizing the calls (one of the cases could be moved top, and we immediately 
return with true, if that is the case). Regarding what operation cost less you 
may have more insight, so I'm also fine with climbing the queue hierarchy first 
and then check the {{AllocationConfiguration}} (if that is the more expensive 
one).

> FS-CS converter: remove mixed ordering policy check
> ---
>
> Key: YARN-10085
> URL: https://issues.apache.org/jira/browse/YARN-10085
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Critical
> Attachments: YARN-10085-001.patch, YARN-10085-002.patch
>
>
> In the converter, this part is very strict and probably unnecessary:
> {noformat}
> // Validate ordering policy
> if (queueConverter.isDrfPolicyUsedOnQueueLevel()) {
>   if (queueConverter.isFifoOrFairSharePolicyUsed()) {
> throw new ConversionException(
> "DRF ordering policy cannot be used together with fifo/fair");
>   } else {
> capacitySchedulerConfig.set(
> CapacitySchedulerConfiguration.RESOURCE_CALCULATOR_CLASS,
> DominantResourceCalculator.class.getCanonicalName());
>   }
> }
> {noformat}
> It's also misleading, because Fair policy can be used under DRF, so the error 
> message is incorrect.
> Let's remove these checks and rewrite the converter in a way that it 
> generates a valid config even if fair/drf is somehow mixed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10097) Cross origin request support for Job history server web UI

2020-01-21 Thread Adam Antal (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Antal updated YARN-10097:
--
Parent: (was: YARN-10025)
Issue Type: Improvement  (was: Sub-task)

> Cross origin request support for Job history server web UI
> --
>
> Key: YARN-10097
> URL: https://issues.apache.org/jira/browse/YARN-10097
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.3.0
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Critical
>
> The major web UIs in YARN support configuring the header, but somehow for the 
> JHS web UI there wasn't any use case. We need it in YARN-10029, so let's 
> implement it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9768) RM Renew Delegation token thread should timeout and retry

[jira] [Commented] (YARN-9768) RM Renew Delegation token thread should timeout and retry

[jira] [Reopened] (YARN-9768) RM Renew Delegation token thread should timeout and retry

[jira] [Commented] (YARN-9768) RM Renew Delegation token thread should timeout and retry

[jira] [Commented] (YARN-9768) RM Renew Delegation token thread should timeout and retry

[jira] [Commented] (YARN-7913) Improve error handling when application recovery fails with exception

[jira] [Commented] (YARN-10084) Allow inheritance of max app lifetime / default app lifetime

[jira] [Commented] (YARN-10084) Allow inheritance of max app lifetime / default app lifetime

[jira] [Commented] (YARN-10084) Allow inheritance of max app lifetime / default app lifetime

[jira] [Commented] (YARN-9768) RM Renew Delegation token thread should timeout and retry

[jira] [Commented] (YARN-9768) RM Renew Delegation token thread should timeout and retry

[jira] [Updated] (YARN-10085) FS-CS converter: remove mixed ordering policy check

[jira] [Commented] (YARN-9879) Allow multiple leaf queues with the same name in CS

[jira] [Assigned] (YARN-9597) Memory efficiency in speculator

[jira] [Commented] (YARN-7913) Improve error handling when application recovery fails with exception

[jira] [Commented] (YARN-10084) Allow inheritance of max app lifetime / default app lifetime

[jira] [Commented] (YARN-10084) Allow inheritance of max app lifetime / default app lifetime

[jira] [Commented] (YARN-10085) FS-CS converter: remove mixed ordering policy check

[jira] [Updated] (YARN-10085) FS-CS converter: remove mixed ordering policy check

[jira] [Commented] (YARN-10085) FS-CS converter: remove mixed ordering policy check

[jira] [Commented] (YARN-9879) Allow multiple leaf queues with the same name in CS

[jira] [Updated] (YARN-9879) Allow multiple leaf queues with the same name in CS

[jira] [Commented] (YARN-7913) Improve error handling when application recovery fails with exception

[jira] [Updated] (YARN-7913) Improve error handling when application recovery fails with exception

[jira] [Commented] (YARN-7913) Improve error handling when application recovery fails with exception

[jira] [Commented] (YARN-9605) Add ZkConfiguredFailoverProxyProvider for RM HA

[jira] [Commented] (YARN-10085) FS-CS converter: remove mixed ordering policy check

[jira] [Updated] (YARN-10085) FS-CS converter: remove mixed ordering policy check

[jira] [Commented] (YARN-9605) Add ZkConfiguredFailoverProxyProvider for RM HA

[jira] [Commented] (YARN-10085) FS-CS converter: remove mixed ordering policy check

[jira] [Updated] (YARN-10097) Cross origin request support for Job history server web UI

31 matches

Site Navigation

Mail list logo

Footer information