[jira] [Assigned] (YARN-10532) Capacity Scheduler Auto Queue Creation: Allow auto delete queue when queue is not being used

2020-12-14 Thread zhuqi (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhuqi reassigned YARN-10532:


Assignee: zhuqi

> Capacity Scheduler Auto Queue Creation: Allow auto delete queue when queue is 
> not being used
> 
>
> Key: YARN-10532
> URL: https://issues.apache.org/jira/browse/YARN-10532
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: zhuqi
>Priority: Major
>
> It's better if we can delete auto-created queues when they are not in use for 
> a period of time (like 5 mins). It will be helpful when we have a large 
> number of auto-created queues (e.g. from 500 users), but only a small subset 
> of queues are actively used.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10532) Capacity Scheduler Auto Queue Creation: Allow auto delete queue when queue is not being used

2020-12-14 Thread zhuqi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249512#comment-17249512
 ] 

zhuqi commented on YARN-10532:
--

[~wangda] 

I want to take it , if no one to take.

Thanks.:)

> Capacity Scheduler Auto Queue Creation: Allow auto delete queue when queue is 
> not being used
> 
>
> Key: YARN-10532
> URL: https://issues.apache.org/jira/browse/YARN-10532
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: zhuqi
>Priority: Major
>
> It's better if we can delete auto-created queues when they are not in use for 
> a period of time (like 5 mins). It will be helpful when we have a large 
> number of auto-created queues (e.g. from 500 users), but only a small subset 
> of queues are actively used.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10533) Display managed apps and unmanaged apps on RM UI

2020-12-14 Thread Cyrus Jackson (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cyrus Jackson updated YARN-10533:
-
Summary: Display managed apps and unmanaged apps on RM UI  (was: Separate 
metrics on RM UI for managed apps and unmanaged apps)

> Display managed apps and unmanaged apps on RM UI
> 
>
> Key: YARN-10533
> URL: https://issues.apache.org/jira/browse/YARN-10533
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Cyrus Jackson
>Assignee: Cyrus Jackson
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-10533) Separate metrics on RM UI for managed apps and unmanaged apps

2020-12-14 Thread Cyrus Jackson (Jira)
Cyrus Jackson created YARN-10533:


 Summary: Separate metrics on RM UI for managed apps and unmanaged 
apps
 Key: YARN-10533
 URL: https://issues.apache.org/jira/browse/YARN-10533
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Cyrus Jackson
Assignee: Cyrus Jackson






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9833) Race condition when DirectoryCollection.checkDirs() runs during container launch

2020-12-14 Thread Eric Badger (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249402#comment-17249402
 ] 

Eric Badger commented on YARN-9833:
---

bq. Isn't this how it has been for years? It was returning an unmodifiableList 
view of the underlying List, so that limits what the caller can do. 
getGoodDirs() and the others just return a read-only List. They don't have to 
know about the internals.
Well, yes an no. It was _supposed_ to be like that. But given this bug, we can 
see that it clearly wasn't. The callee in this case _should_ have been atomic 
and so the unmodifiable view of the list _should_ have been fine. But when you 
get into fine-grained locking like this, mistakes are easy to make because the 
person making the change doesn't necessarily understand the history behind why 
the code is written the way it is. If we can guarantee that the callee will 
always perform atomic operations on the lists, then there isn't an issue. Maybe 
we can guarantee this by adding a comment/warning in the checkDirs() function 
to make sure that anyone touching this code is super careful about locking.

I agree with the idea of fixing this on the callee side and not having the 
caller create a new object everytime. I just want to make sure that this bug 
isn't reintroduced by accident down the line because of the added complexity of 
fine-grained locking. I don't know if this is a performance-sensitive area of 
the code where such a tradeoff would clearly be to go for performance.

> Race condition when DirectoryCollection.checkDirs() runs during container 
> launch
> 
>
> Key: YARN-9833
> URL: https://issues.apache.org/jira/browse/YARN-9833
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.2.0
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Fix For: 3.3.0, 3.2.2, 3.1.4
>
> Attachments: YARN-9833-001.patch
>
>
> During endurance testing, we found a race condition that cause an empty 
> {{localDirs}} being passed to container-executor.
> The problem is that {{DirectoryCollection.checkDirs()}} clears three 
> collections:
> {code:java}
> this.writeLock.lock();
> try {
>   localDirs.clear();
>   errorDirs.clear();
>   fullDirs.clear();
>   ...
> {code}
> This happens in critical section guarded by a write lock. When we start a 
> container, we retrieve the local dirs by calling 
> {{dirsHandler.getLocalDirs();}} which in turn invokes 
> {{DirectoryCollection.getGoodDirs()}}. The implementation of this method is:
> {code:java}
> List getGoodDirs() {
> this.readLock.lock();
> try {
>   return Collections.unmodifiableList(localDirs);
> } finally {
>   this.readLock.unlock();
> }
>   }
> {code}
> So we're also in a critical section guarded by the lock. But 
> {{Collections.unmodifiableList()}} only returns a _view_ of the collection, 
> not a copy. After we get the view, {{MonitoringTimerTask.run()}} might be 
> scheduled to run and immediately clears {{localDirs}}.
> This caused a weird behaviour in container-executor, which exited with error 
> code 35 (COULD_NOT_CREATE_WORK_DIRECTORIES).
> Therefore we can't just return a view, we must return a copy with 
> {{ImmutableList.copyOf()}}.
> Credits to [~snemeth] for analyzing and determining the root cause.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9833) Race condition when DirectoryCollection.checkDirs() runs during container launch

2020-12-14 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249316#comment-17249316
 ] 

Jim Brennan commented on YARN-9833:
---

{quote}
My worry with this is that code changes in the future will incorrectly use 
getGoodDirs or the other methods that expose the private lists from within 
DirectoryCollection.
{quote}
Isn't this how it has been for years?  It was returning an unmodifiableList 
view of the underlying List, so that limits what the caller can do.  
getGoodDirs() and the others just return a read-only List.  They don't have to 
know about the internals.
If we are going to change these to return a copy of the list, we may want to 
reconsider the use of CopyOnWriteArrayList - I'm not sure it is buying us 
anything.


> Race condition when DirectoryCollection.checkDirs() runs during container 
> launch
> 
>
> Key: YARN-9833
> URL: https://issues.apache.org/jira/browse/YARN-9833
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.2.0
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Fix For: 3.3.0, 3.2.2, 3.1.4
>
> Attachments: YARN-9833-001.patch
>
>
> During endurance testing, we found a race condition that cause an empty 
> {{localDirs}} being passed to container-executor.
> The problem is that {{DirectoryCollection.checkDirs()}} clears three 
> collections:
> {code:java}
> this.writeLock.lock();
> try {
>   localDirs.clear();
>   errorDirs.clear();
>   fullDirs.clear();
>   ...
> {code}
> This happens in critical section guarded by a write lock. When we start a 
> container, we retrieve the local dirs by calling 
> {{dirsHandler.getLocalDirs();}} which in turn invokes 
> {{DirectoryCollection.getGoodDirs()}}. The implementation of this method is:
> {code:java}
> List getGoodDirs() {
> this.readLock.lock();
> try {
>   return Collections.unmodifiableList(localDirs);
> } finally {
>   this.readLock.unlock();
> }
>   }
> {code}
> So we're also in a critical section guarded by the lock. But 
> {{Collections.unmodifiableList()}} only returns a _view_ of the collection, 
> not a copy. After we get the view, {{MonitoringTimerTask.run()}} might be 
> scheduled to run and immediately clears {{localDirs}}.
> This caused a weird behaviour in container-executor, which exited with error 
> code 35 (COULD_NOT_CREATE_WORK_DIRECTORIES).
> Therefore we can't just return a view, we must return a copy with 
> {{ImmutableList.copyOf()}}.
> Credits to [~snemeth] for analyzing and determining the root cause.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9833) Race condition when DirectoryCollection.checkDirs() runs during container launch

2020-12-14 Thread Eric Badger (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249289#comment-17249289
 ] 

Eric Badger commented on YARN-9833:
---

I agree that the CopyOnWriteArrayList is most likely a performance thing. Since 
{{getGoodDirs()}} is called on every container launch, then that's a lot of 
copying. Not sure it's that bad in the grand scheme of things though. 

bq. My suggestion for fixing this would be to fix the checkdirs() 
implementation to operate on local copies of these arrays, and then update them 
with a single assignment only if they have changed.
My worry with this is that code changes in the future will incorrectly use 
{{getGoodDirs}} or the other methods that expose the private lists from within 
DirectoryCollection. So in my mind it's a tradeoff between performance and 
maintainability. I don't know what the performance impact is. We could 
potentially mitigate some (most?) of the maintainability impact via a comment 
on the getGoodDirs() method (as well as the getLocalDirs() method in 
LocalDirsHandlerService). In general, I don't like calling methods to have to 
be aware of callee methods and having to deal with their locking. That could 
also be mitigated by fixing the callee method to remove the race condition, but 
that could be reintroduced by accident in the future, since they may not 
understand the full impact of the CopyOnWriteArrayList

bq. 1. We were not thinking about errorDirs because as we were tracking down 
the issue, only localDirs seemed to be problematic, although I agree that it is 
inconsistent this way. Shall we follow-up on this?
Yea, we should definitely follow up to fix errorDirs

> Race condition when DirectoryCollection.checkDirs() runs during container 
> launch
> 
>
> Key: YARN-9833
> URL: https://issues.apache.org/jira/browse/YARN-9833
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.2.0
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Fix For: 3.3.0, 3.2.2, 3.1.4
>
> Attachments: YARN-9833-001.patch
>
>
> During endurance testing, we found a race condition that cause an empty 
> {{localDirs}} being passed to container-executor.
> The problem is that {{DirectoryCollection.checkDirs()}} clears three 
> collections:
> {code:java}
> this.writeLock.lock();
> try {
>   localDirs.clear();
>   errorDirs.clear();
>   fullDirs.clear();
>   ...
> {code}
> This happens in critical section guarded by a write lock. When we start a 
> container, we retrieve the local dirs by calling 
> {{dirsHandler.getLocalDirs();}} which in turn invokes 
> {{DirectoryCollection.getGoodDirs()}}. The implementation of this method is:
> {code:java}
> List getGoodDirs() {
> this.readLock.lock();
> try {
>   return Collections.unmodifiableList(localDirs);
> } finally {
>   this.readLock.unlock();
> }
>   }
> {code}
> So we're also in a critical section guarded by the lock. But 
> {{Collections.unmodifiableList()}} only returns a _view_ of the collection, 
> not a copy. After we get the view, {{MonitoringTimerTask.run()}} might be 
> scheduled to run and immediately clears {{localDirs}}.
> This caused a weird behaviour in container-executor, which exited with error 
> code 35 (COULD_NOT_CREATE_WORK_DIRECTORIES).
> Therefore we can't just return a view, we must return a copy with 
> {{ImmutableList.copyOf()}}.
> Credits to [~snemeth] for analyzing and determining the root cause.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10526) RMAppManager CS Placement ignores parent path

2020-12-14 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249106#comment-17249106
 ] 

Hadoop QA commented on YARN-10526:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
22s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
1s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:green}+1{color} | {color:green} {color} | {color:green}  0m  0s{color} 
| {color:green}test4tests{color} | {color:green} The patch appears to include 2 
new or modified test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 33m 
56s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
2s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
52s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
39s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
54s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
19m 36s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
48s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
43s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  2m  
6s{color} | {color:blue}{color} | {color:blue} Used deprecated FindBugs config; 
considering switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
4s{color} | {color:green}{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
56s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
0s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
0s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
51s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
51s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
37s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
56s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace 
issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 51s{color} | {color:green}{color} | {color:green} patch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
44s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
38s{color} | {color:green}{color} | {color:green} the 

[jira] [Commented] (YARN-9833) Race condition when DirectoryCollection.checkDirs() runs during container launch

2020-12-14 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249043#comment-17249043
 ] 

Jim Brennan commented on YARN-9833:
---

Thinking about it more over the weekend, I suspect that the reason the 
CopyOnWriteArrayList was used was more for performance than to allow someone to 
hang onto the reference for a long time.  This ideally is a list that doesn't 
change very often, so handing out a view of a copy-on-write array is cheaper 
than making a copy every time we launch a container.  Unfortunately, 
{{checkdirs()}} as written seems to ruin any advantage we've gained by mutating 
the lists every time it runs (and multiple times at that, by first clearing and 
then adding each entry individually).   This is also where the race comes in.

My suggestion for fixing this would be to fix the {{checkdirs()}}  
implementation to operate on local copies of these arrays, and then update them 
with a single assignment only if they have changed.

 

> Race condition when DirectoryCollection.checkDirs() runs during container 
> launch
> 
>
> Key: YARN-9833
> URL: https://issues.apache.org/jira/browse/YARN-9833
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.2.0
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Fix For: 3.3.0, 3.2.2, 3.1.4
>
> Attachments: YARN-9833-001.patch
>
>
> During endurance testing, we found a race condition that cause an empty 
> {{localDirs}} being passed to container-executor.
> The problem is that {{DirectoryCollection.checkDirs()}} clears three 
> collections:
> {code:java}
> this.writeLock.lock();
> try {
>   localDirs.clear();
>   errorDirs.clear();
>   fullDirs.clear();
>   ...
> {code}
> This happens in critical section guarded by a write lock. When we start a 
> container, we retrieve the local dirs by calling 
> {{dirsHandler.getLocalDirs();}} which in turn invokes 
> {{DirectoryCollection.getGoodDirs()}}. The implementation of this method is:
> {code:java}
> List getGoodDirs() {
> this.readLock.lock();
> try {
>   return Collections.unmodifiableList(localDirs);
> } finally {
>   this.readLock.unlock();
> }
>   }
> {code}
> So we're also in a critical section guarded by the lock. But 
> {{Collections.unmodifiableList()}} only returns a _view_ of the collection, 
> not a copy. After we get the view, {{MonitoringTimerTask.run()}} might be 
> scheduled to run and immediately clears {{localDirs}}.
> This caused a weird behaviour in container-executor, which exited with error 
> code 35 (COULD_NOT_CREATE_WORK_DIRECTORIES).
> Therefore we can't just return a view, we must return a copy with 
> {{ImmutableList.copyOf()}}.
> Credits to [~snemeth] for analyzing and determining the root cause.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10526) RMAppManager CS Placement ignores parent path

2020-12-14 Thread Gergely Pollak (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249001#comment-17249001
 ] 

Gergely Pollak commented on YARN-10526:
---

Latest version is supposed to be final, added a test and a few comments. Also 
please ignore the fail in TestDelegationTokenRenewer, it seems to be flaky and 
is unrelated.

> RMAppManager CS Placement ignores parent path
> -
>
> Key: YARN-10526
> URL: https://issues.apache.org/jira/browse/YARN-10526
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Gergely Pollak
>Assignee: Gergely Pollak
>Priority: Major
> Attachments: YARN-10526.001.patch, YARN-10526.002.patch, 
> YARN-10526.003.patch, YARN-10526.004.patch
>
>
> When RMAppManager creates the RMApp object using the placementContext's 
> results, it only uses the getQueue method which will return only the name of 
> the leaf queue in the case of CapacityScheduler.
> If a queue exists with this name, then the application will be placed into 
> the queue. If the queue does not exists, then CS will take the parent path 
> into consideration during the auto queue creation, however this only happens 
> if there is no queue with the leaf name.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10526) RMAppManager CS Placement ignores parent path

2020-12-14 Thread Gergely Pollak (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gergely Pollak updated YARN-10526:
--
Attachment: YARN-10526.004.patch

> RMAppManager CS Placement ignores parent path
> -
>
> Key: YARN-10526
> URL: https://issues.apache.org/jira/browse/YARN-10526
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Gergely Pollak
>Assignee: Gergely Pollak
>Priority: Major
> Attachments: YARN-10526.001.patch, YARN-10526.002.patch, 
> YARN-10526.003.patch, YARN-10526.004.patch
>
>
> When RMAppManager creates the RMApp object using the placementContext's 
> results, it only uses the getQueue method which will return only the name of 
> the leaf queue in the case of CapacityScheduler.
> If a queue exists with this name, then the application will be placed into 
> the queue. If the queue does not exists, then CS will take the parent path 
> into consideration during the auto queue creation, however this only happens 
> if there is no queue with the leaf name.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10506) Update queue creation logic to use weight mode and allow the flexible static/dynamic creation

2020-12-14 Thread zhuqi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248989#comment-17248989
 ] 

zhuqi commented on YARN-10506:
--

[~wangda]
{quote}Handle convert of dynamic queue to static queue (still see some test 
failures).
{quote}
The test failed because some mistakes queue name , should change to this:
{code:java}
// Do change 2nd level queue from dynamic to static
csConf.setQueues("root", new String[] { "a", "b", "c-auto", "e" });
csConf.setNonLabeledQueueWeight("root.e", 6f);
csConf.setQueues("root.e", new String[] { "e1-auto" });
csConf.setNonLabeledQueueWeight("root.e.e1-auto", 6f);
cs.reinitialize(csConf, mockRM.getRMContext());

// Get queue c
CSQueue e1 = cs.getQueue("root.e.e1-auto");

// e's abs resource should be 6/20 * (6/7), (since a/c/e.weight=6, all other 2 
peers
// have weight=1, and e1's weight is 6, e2's weight is 1).
Assert.assertEquals((6 / 20f) * (6 / 7f), e1.getAbsoluteCapacity(), 1e-6);
Assert.assertEquals(360 * GB,
c.getQueueResourceQuotas().getEffectiveMinResource().getMemorySize());
Assert.assertEquals(6f, c.getQueueCapacities().getWeight(), 1e-6);
{code}

> Update queue creation logic to use weight mode and allow the flexible 
> static/dynamic creation
> -
>
> Key: YARN-10506
> URL: https://issues.apache.org/jira/browse/YARN-10506
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Benjamin Teke
>Priority: Major
> Attachments: YARN-10506.001.patch
>
>
> The queue creation logic should be updated to use weight mode and support the 
> flexible creation. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10510) TestAppPage.testAppBlockRenderWithNullCurrentAppAttempt will cause NullPointerException

2020-12-14 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248866#comment-17248866
 ] 

Hadoop QA commented on YARN-10510:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
38s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:green}+1{color} | {color:green} {color} | {color:green}  0m  0s{color} 
| {color:green}test4tests{color} | {color:green} The patch appears to include 1 
new or modified test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
 0s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
0s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
52s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
39s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
56s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 55s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
39s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
43s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  1m 
49s{color} | {color:blue}{color} | {color:blue} Used deprecated FindBugs 
config; considering switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
47s{color} | {color:green}{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
52s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
53s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
53s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
45s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
45s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 31s{color} | 
{color:orange}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/385/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt{color}
 | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 7 new + 48 unchanged - 0 fixed = 55 total (was 48) {color} 
|
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
48s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace 
issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 11s{color} | {color:green}{color} | {color:green} patch has no errors when 
building and testing our client artifacts. {color} |
| 

[jira] [Updated] (YARN-10519) Refactor QueueMetricsForCustomResources class to move to yarn-common package

2020-12-14 Thread Minni Mittal (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Minni Mittal updated YARN-10519:

Attachment: YARN-10519.v4.patch

> Refactor QueueMetricsForCustomResources class to move to yarn-common package
> 
>
> Key: YARN-10519
> URL: https://issues.apache.org/jira/browse/YARN-10519
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Minni Mittal
>Assignee: Minni Mittal
>Priority: Major
> Attachments: YARN-10519.v1.patch, YARN-10519.v2.patch, 
> YARN-10519.v3.patch, YARN-10519.v4.patch
>
>
> Refactor the code for QueueMetricsForCustomResources to move the base classes 
> to yarn-common package. This helps in reusing the class in adding custom 
> resource types at NM level also. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10531) Be able to disable user limit factor for CapacityScheduler Leaf Queue

2020-12-14 Thread zhuqi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248823#comment-17248823
 ] 

zhuqi commented on YARN-10531:
--

[~wangda] 

Attach a latest patch, to make unit test more clear. And change the logic:
{code:java}
// If user-limit-factor set to -1, we should disabled user limit.
if (getUserLimitFactor() != -1) {
  maxUserLimit = Resources.multiplyAndRoundDown(queueCapacity,
  getUserLimitFactor());
} else {
  maxUserLimit = lQueue.
  getEffectiveMaxCapacityDown(nodePartition, lQueue.getMinimumAllocation());
}
{code}
Other changes is related to maxPerUserApplications and so on.

 

> Be able to disable user limit factor for CapacityScheduler Leaf Queue
> -
>
> Key: YARN-10531
> URL: https://issues.apache.org/jira/browse/YARN-10531
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: zhuqi
>Priority: Major
> Attachments: YARN-10531.001.patch, YARN-10531.002.patch
>
>
> User limit factor is used to define max cap of how much resource can be 
> consumed by single user. 
> Under Auto Queue Creation context, it doesn't make much sense to set user 
> limit factor, because initially every queue will set weight to 1.0, we want 
> user can consume more resource if possible. It is hard to pre-determine how 
> to set up user limit factor. So it makes more sense to add a new value (like 
> -1) to indicate we will disable user limit factor 
> Logic need to be changed is below: 
> (Inside LeafQueue.java)
> {code}
> Resource maxUserLimit = Resources.none();
> if (schedulingMode == SchedulingMode.RESPECT_PARTITION_EXCLUSIVITY) {
>   maxUserLimit = Resources.multiplyAndRoundDown(queueCapacity,
>   getUserLimitFactor());
> } else if (schedulingMode == SchedulingMode.IGNORE_PARTITION_EXCLUSIVITY) 
> {
>   maxUserLimit = partitionResource;
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org