[jira] [Commented] (YARN-7291) Better input parsing for resource in allocation file

2019-08-20 Thread Wilfred Spiegelenburg (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-7291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16911814#comment-16911814
 ] 

Wilfred Spiegelenburg commented on YARN-7291:
-

The change looks good +1 (non binding)
All old tests are still passing and new ones added so we should not have 
regressed.

> Better input parsing for resource in allocation file
> 
>
> Key: YARN-7291
> URL: https://issues.apache.org/jira/browse/YARN-7291
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 3.1.0
>Reporter: Yufei Gu
>Assignee: Zoltan Siegl
>Priority: Minor
>  Labels: newbie
> Attachments: YARN-7291.001.patch, YARN-7291.002.patch, 
> YARN-7291.003.patch, YARN-7291.004.patch, YARN-7291.005.patch, 
> YARN-7291.005.patch
>
>
> When you set max/min share for queues in fair scheduler allocation file,  
> "1024 mb, 2 4 vcores" is parsed the same as "1024 mb, 4 vcores" without any 
> issue, the same to "50% memory, 50% 100%cpu" which is parsed the same as "50% 
> memory, 100%cpu". That causes confusing. We should fix it. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-1655) Add implementations to FairScheduler to support increase/decrease container resource

2019-08-04 Thread Wilfred Spiegelenburg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16899703#comment-16899703
 ] 

Wilfred Spiegelenburg commented on YARN-1655:
-

The "new" checkstyle is not really new and is triggered by a layout change. 
Renaming the stateMachineFactory to comply is a far bigger change. I already 
fixed up a lot of the layout issues in the RMContanerImpl class and will leave 
this one alone.

The second test failure is known as YARN-9333

[~snemeth] can you have a look please?

> Add implementations to FairScheduler to support increase/decrease container 
> resource
> 
>
> Key: YARN-1655
> URL: https://issues.apache.org/jira/browse/YARN-1655
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, scheduler
>Reporter: Wangda Tan
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Attachments: YARN-1655.001.patch, YARN-1655.002.patch, 
> YARN-1655.003.patch, YARN-1655.004.patch, YARN-1655.005.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-1655) Add implementations to FairScheduler to support increase/decrease container resource

2019-08-01 Thread Wilfred Spiegelenburg (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YARN-1655:

Attachment: YARN-1655.005.patch

> Add implementations to FairScheduler to support increase/decrease container 
> resource
> 
>
> Key: YARN-1655
> URL: https://issues.apache.org/jira/browse/YARN-1655
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, scheduler
>Reporter: Wangda Tan
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Attachments: YARN-1655.001.patch, YARN-1655.002.patch, 
> YARN-1655.003.patch, YARN-1655.004.patch, YARN-1655.005.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-1655) Add implementations to FairScheduler to support increase/decrease container resource

2019-08-01 Thread Wilfred Spiegelenburg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16898466#comment-16898466
 ] 

Wilfred Spiegelenburg commented on YARN-1655:
-

the failed unit test is flaky as per YARN-8433 and not related to the change.

I fixed up the checkstyle issues. Most of the change in the RMContainerImpl is 
a layout change to clean up the incorrect state machine layout. 
[^YARN-1655.005.patch] 

> Add implementations to FairScheduler to support increase/decrease container 
> resource
> 
>
> Key: YARN-1655
> URL: https://issues.apache.org/jira/browse/YARN-1655
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, scheduler
>Reporter: Wangda Tan
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Attachments: YARN-1655.001.patch, YARN-1655.002.patch, 
> YARN-1655.003.patch, YARN-1655.004.patch, YARN-1655.005.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-1655) Add implementations to FairScheduler to support increase/decrease container resource

2019-08-01 Thread Wilfred Spiegelenburg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16897819#comment-16897819
 ] 

Wilfred Spiegelenburg commented on YARN-1655:
-

Thank you for the feedback [~snemeth], sorry that it took this long.

I have updated the patch and fixed all the remarks.
All except for 4 are straight forward simple changes. To fix 4 I did the 
following:
- make a new {{allocate}} method in the MockRM that takes no arguments and 
calls the real allocate with _nulls_
- updated the calls in the test code to use the new method and added a comment 
to what it does (i.e. process outstanding requests)
- split the other {{allocate}} call in the test code into two steps: a separate 
alloc of the request and a call to {{allocate}} on the app master

That should clear point 4 up.

 [^YARN-1655.004.patch] 

> Add implementations to FairScheduler to support increase/decrease container 
> resource
> 
>
> Key: YARN-1655
> URL: https://issues.apache.org/jira/browse/YARN-1655
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, scheduler
>Reporter: Wangda Tan
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Attachments: YARN-1655.001.patch, YARN-1655.002.patch, 
> YARN-1655.003.patch, YARN-1655.004.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-1655) Add implementations to FairScheduler to support increase/decrease container resource

2019-08-01 Thread Wilfred Spiegelenburg (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YARN-1655:

Attachment: YARN-1655.004.patch

> Add implementations to FairScheduler to support increase/decrease container 
> resource
> 
>
> Key: YARN-1655
> URL: https://issues.apache.org/jira/browse/YARN-1655
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, scheduler
>Reporter: Wangda Tan
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Attachments: YARN-1655.001.patch, YARN-1655.002.patch, 
> YARN-1655.003.patch, YARN-1655.004.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7621) Support submitting apps with queue path for CapacityScheduler

2019-07-31 Thread Wilfred Spiegelenburg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16897745#comment-16897745
 ] 

Wilfred Spiegelenburg commented on YARN-7621:
-

You will need to support duplicate leaf queue names in the CS.  As [~cane] 
mentioned in his update the FS has the concept of a real hierarchy. This means 
that you can have the following config:
{code:java}
   root
   +^+
   parent1 parent2
  +--^--+  +--^--+
childA   ChildB  ChildAChildB
{code}
Stripping off the last part of the queue thus will collapse the structure and 
cause issues. Applications that ran i different queues are now ending up in the 
same queue. If the parent queue ACLs or resource settings are different then 
you will have an even bigger problem.

This could also break with the current placement rules that are currently used 
in the FS. If I generate a queue dynamically via a placement rule and there is 
a parent rule set.

The way the queue hierarchy is implemented in the CS needs to be updated to 
remove the limitation that every leaf queue must be unique in the CS. This is 
more work than what is covered in this patch.

> Support submitting apps with queue path for CapacityScheduler
> -
>
> Key: YARN-7621
> URL: https://issues.apache.org/jira/browse/YARN-7621
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
>  Labels: fs2cs
> Attachments: YARN-7621.001.patch, YARN-7621.002.patch
>
>
> Currently there is a difference of queue definition in 
> ApplicationSubmissionContext between CapacityScheduler and FairScheduler. 
> FairScheduler needs queue path but CapacityScheduler needs queue name. There 
> is no doubt of the correction of queue definition for CapacityScheduler 
> because it does not allow duplicate leaf queue names, but it's hard to switch 
> between FairScheduler and CapacityScheduler. I propose to support submitting 
> apps with queue path for CapacityScheduler to make the interface clearer and 
> scheduler switch smoothly.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-9516) move application between queues,not check target queue acl permission

2019-05-01 Thread Wilfred Spiegelenburg (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg resolved YARN-9516.
-
Resolution: Duplicate

This has been fixed as YARN-5554 MoveApplicationAcrossQueues does not check 
user permission on the target queue in 3.0


> move application between queues,not check target queue acl permission
> -
>
> Key: YARN-9516
> URL: https://issues.apache.org/jira/browse/YARN-9516
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler, resourcemanager
>Affects Versions: 2.6.0
>Reporter: powerinf
>Priority: Critical
>
> User test1 can subbmit a application on queue root.test.test1,but not  to 
> queue root.test.test2.when I subbmit a application on queue root.test.test1 
> using user test1, and try to move the application to root.test.test2, it can 
> move successfully,not check target queue acl permission,



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9431) Fix flaky junit test fair.TestAppRunnability after YARN-8967

2019-04-01 Thread Wilfred Spiegelenburg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16807337#comment-16807337
 ] 

Wilfred Spiegelenburg commented on YARN-9431:
-

Thank you [~giovanni.fumarola] for the commit and [~pbacsko] for confirming the 
fix

> Fix flaky junit test fair.TestAppRunnability after YARN-8967
> 
>
> Key: YARN-9431
> URL: https://issues.apache.org/jira/browse/YARN-9431
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler, test
>Affects Versions: 3.3.0
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Minor
> Fix For: 3.3.0
>
> Attachments: YARN-9431.001.patch
>
>
> In YARN-4901 one of the scheduler tests failed. This seems to be linked to 
> the changes around the placement rules introduced in YARN-8967.
> Applications submitted in the tests are accepted and rejected at the same 
> time:
> {code}
> 2019-04-01 12:00:57,269 INFO  [main] fair.FairScheduler 
> (FairScheduler.java:addApplication(540)) - Accepted application 
> application_0_0001 from user: user1, in queue: root.user1, currently num of 
> applications: 1
> 2019-04-01 12:00:57,269 INFO  [AsyncDispatcher event handler] 
> fair.FairScheduler (FairScheduler.java:rejectApplicationWithMessage(1344)) - 
> Reject application application_0_0001 submitted by user user1 application 
> rejected by placement rules.
> {code}
> This should never happen and is most likely due to the way the tests 
> generates the application and events.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9431) flaky junit test fair.TestAppRunnability after YARN-8967

2019-03-31 Thread Wilfred Spiegelenburg (JIRA)
Wilfred Spiegelenburg created YARN-9431:
---

 Summary: flaky junit test fair.TestAppRunnability after YARN-8967
 Key: YARN-9431
 URL: https://issues.apache.org/jira/browse/YARN-9431
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler, test
Affects Versions: 3.3.0
Reporter: Wilfred Spiegelenburg
Assignee: Wilfred Spiegelenburg


In YARN-4901 one of the scheduler tests failed. This seems to be linked to the 
changes around the placement rules introduced in YARN-8967.

Applications submitted in the tests are accepted and rejected at the same time:
{code}
2019-04-01 12:00:57,269 INFO  [main] fair.FairScheduler 
(FairScheduler.java:addApplication(540)) - Accepted application 
application_0_0001 from user: user1, in queue: root.user1, currently num of 
applications: 1
2019-04-01 12:00:57,269 INFO  [AsyncDispatcher event handler] 
fair.FairScheduler (FairScheduler.java:rejectApplicationWithMessage(1344)) - 
Reject application application_0_0001 submitted by user user1 application 
rejected by placement rules.
{code}
This should never happen and is most likely due to the way the tests generates 
the application and events.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4901) MockRM should clear the QueueMetrics when it starts

2019-03-31 Thread Wilfred Spiegelenburg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16806326#comment-16806326
 ] 

Wilfred Spiegelenburg commented on YARN-4901:
-

I have run the test over 2500 times and cannot get the failure to reproduce.
I do see some weird things in my local run which could explain the failure. 
Opened a new jira for 
this:[YARN-9431|https://issues.apache.org/jira/browse/YARN-9431]

> MockRM should clear the QueueMetrics when it starts
> ---
>
> Key: YARN-4901
> URL: https://issues.apache.org/jira/browse/YARN-4901
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Reporter: Daniel Templeton
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-4901-001.patch
>
>
> The {{ResourceManager}} rightly assumes that when it starts, it's starting 
> from naught.  The {{MockRM}}, however, violates that assumption.  For 
> example, in {{TestNMReconnect}}, each test method creates a new {{MockRM}} 
> instance.  The {{QueueMetrics.queueMetrics}} field is static, which means 
> that when multiple {{MockRM}} instances are created, the {{QueueMetrics}} 
> bleed over.  Having the MockRM clear the {{QueueMetrics}} when it starts 
> should resolve the issue.  I haven't looked yet at scope to see how hard easy 
> that is to do.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9417) Implement FS equivalent of AppNameMappingPlacementRule

2019-03-27 Thread Wilfred Spiegelenburg (JIRA)
Wilfred Spiegelenburg created YARN-9417:
---

 Summary: Implement FS equivalent of AppNameMappingPlacementRule
 Key: YARN-9417
 URL: https://issues.apache.org/jira/browse/YARN-9417
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 3.3.0
Reporter: Wilfred Spiegelenburg
Assignee: Wilfred Spiegelenburg


The AppNameMappingPlacementRule is only available for the CS. We need the same 
kind of rule for the FS.
The rule should use the application name as set in the submission context.

This allows spark, mr or tez jobs to be run in their own queues.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9416) Add filter options to FS placement rules

2019-03-27 Thread Wilfred Spiegelenburg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16802761#comment-16802761
 ] 

Wilfred Spiegelenburg edited comment on YARN-9416 at 3/27/19 1:22 PM:
--

The proposal is to add a new child entry to all rules, like the parent rule we 
have now.

Name of the xml node: 
* filter
Name of the attributes supported for each: 
* type (_allow_ or _deny_)
* users  (comma separated list)
* groups (comma separated _ordered_ list)

The type attribute is required. One of the users and groups attributes can be 
omitted or left empty. If both are left empty the filter is ignored.
The ordering only has an impact on the secondary group rule, and thus the group 
filter, in combination with the _allow_ type. That is the only rule that has a 
loop running over a number of values that are returned in a random order by the 
OS. The order in which the list is specified will be the order in which the 
secondary groups are evaluated in the rule.

When a rule has a filter set we check the filter before we decide if the queue 
found will be returned. This is independent of the ACLs.


was (Author: wilfreds):
The proposal is to add a new child entry to all rules, like the parent rule we 
have now.

Name of the xml node: 
* userfilter
* groupfilter
Name of the attributes supported for each: 
* type (order, allow or deny)
* members (comma separated ordered list)
When a rule has a filter set we check the filter before we decide if the queue 
found will be returned. This is independent of the ACLs.

> Add filter options to FS placement rules
> 
>
> Key: YARN-9416
> URL: https://issues.apache.org/jira/browse/YARN-9416
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 3.3.0
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
>
> The placement rules should allow filtering of the groups and or users that 
> match the rule.
> In the case of the user rule you might want it to only match if the users are 
> member of a specific group. An other example would be to only allow specific 
> users to match the specified rule.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9416) Add filter options to FS placement rules

2019-03-27 Thread Wilfred Spiegelenburg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16802761#comment-16802761
 ] 

Wilfred Spiegelenburg commented on YARN-9416:
-

The proposal is to add a new child entry to all rules, like the parent rule we 
have now.

Name of the xml node: 
* userfilter
* groupfilter
Name of the attributes supported for each: 
* type (order, allow or deny)
* members (comma separated ordered list)
When a rule has a filter set we check the filter before we decide if the queue 
found will be returned. This is independent of the ACLs.

> Add filter options to FS placement rules
> 
>
> Key: YARN-9416
> URL: https://issues.apache.org/jira/browse/YARN-9416
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 3.3.0
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
>
> The placement rules should allow filtering of the groups and or users that 
> match the rule.
> In the case of the user rule you might want it to only match if the users are 
> member of a specific group. An other example would be to only allow specific 
> users to match the specified rule.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9416) Add filter options to FS placement rules

2019-03-27 Thread Wilfred Spiegelenburg (JIRA)
Wilfred Spiegelenburg created YARN-9416:
---

 Summary: Add filter options to FS placement rules
 Key: YARN-9416
 URL: https://issues.apache.org/jira/browse/YARN-9416
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Affects Versions: 3.3.0
Reporter: Wilfred Spiegelenburg
Assignee: Wilfred Spiegelenburg


The placement rules should allow filtering of the groups and or users that 
match the rule.

In the case of the user rule you might want it to only match if the users are 
member of a specific group. An other example would be to only allow specific 
users to match the specified rule.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8793) QueuePlacementPolicy bind more information to assigning result

2019-03-27 Thread Wilfred Spiegelenburg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16802721#comment-16802721
 ] 

Wilfred Spiegelenburg commented on YARN-8793:
-

The PlacementRule and PlacementManager have standardised the way a chain is 
terminated and what is communicated back.

The FS has moved to using that interfaces to handle queue placements. 
Placements are handled outside the scheduler.

> QueuePlacementPolicy bind more information to assigning result
> --
>
> Key: YARN-8793
> URL: https://issues.apache.org/jira/browse/YARN-8793
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Affects Versions: 3.1.1
>Reporter: Shuai Zhang
>Assignee: Shuai Zhang
>Priority: Major
> Attachments: YARN-8793.001.patch, YARN-8793.002.patch, 
> YARN-8793.003.patch, YARN-8793.004.patch, YARN-8793.005.patch, 
> YARN-8793.006.patch, YARN-8793.007.patch, YARN-8793.008.patch
>
>
> Fair scheduler's QueuePlacementPolicy should bind more information to 
> assigning result:
>  # Whether to terminate the chain of responsibility
>  # The reason to reject a request



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-5387) FairScheduler: add the ability to specify a parent queue to all placement rules

2019-03-27 Thread Wilfred Spiegelenburg (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg resolved YARN-5387.
-
Resolution: Implemented

This has been included as part of the YARN-8967 changes.
Documentation is still outstanding and will be added as part of YARN-9415.

> FairScheduler: add the ability to specify a parent queue to all placement 
> rules
> ---
>
> Key: YARN-5387
> URL: https://issues.apache.org/jira/browse/YARN-5387
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
>  Labels: supportability
>
> In the current placement policy there all rules generate a queue name under 
> the root. The only exception is the nestedUserQueue rule. This rule allows a 
> queue to be created under a parent queue defined by a second rule.
> Instead of creating new rules to also allow nested groups, secondary groups 
> or  nested queues for new rules that we think of we should generalise this by 
> allowing a parent attribute to be specified in each rule like the create flag.
> The optional parent attribute for a rule should allow the following values:
> - empty (which is the same as not specifying the attribute)
> - a rule
> - a fixed value (with or without the root prefix)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8795) QueuePlacementRule move to separate files

2019-03-27 Thread Wilfred Spiegelenburg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16802708#comment-16802708
 ] 

Wilfred Spiegelenburg commented on YARN-8795:
-

The rules have been moved as part of the move to a new interface. The rules are 
now all using the PlacementRule and are now all located in their own file(s).

> QueuePlacementRule move to separate files
> -
>
> Key: YARN-8795
> URL: https://issues.apache.org/jira/browse/YARN-8795
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Affects Versions: 3.1.1
>Reporter: Shuai Zhang
>Assignee: Shuai Zhang
>Priority: Major
> Attachments: YARN-8795.002.patch, YARN-8795.003.patch, 
> YARN-8795.004.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8792) Revisit FairScheduler QueuePlacementPolicy

2019-03-27 Thread Wilfred Spiegelenburg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16802707#comment-16802707
 ] 

Wilfred Spiegelenburg commented on YARN-8792:
-

None of these changes fit into the integrated way we currently implement the 
rules in the FS and CS.
As part of YARN-8948, YARN-9298 and finally integrated in YARN-8967 this has 
been changed. Both schedulers now use the same placement manager and placement 
rule code. The placement of the application in a queue is moved out of the FS 
also.

> Revisit FairScheduler QueuePlacementPolicy 
> ---
>
> Key: YARN-8792
> URL: https://issues.apache.org/jira/browse/YARN-8792
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 3.1.1
>Reporter: Shuai Zhang
>Assignee: Shuai Zhang
>Priority: Major
>
> Fair scheduler use `QueuePlacementPolicy` to map a request to queue. There 
> are several problems:
>  # The termination of the responsibility chain should bind to the assigning 
> result instead of the rule.
>  # It should provide a reason when rejecting a request.
>  # Still need more useful rules:
>  ## RejectNonLeafQueue
>  ## RejectDefaultQueue
>  ## RejectUsers
>  ## RejectQueues
>  ## DefaultByUser



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-2257) Add user to queue mappings to automatically place users' apps into specific queues

2019-03-27 Thread Wilfred Spiegelenburg (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg resolved YARN-2257.
-
Resolution: Duplicate

This has been fixed as part of YARN-8948, YARN-9298 and finally integrated in 
YARN-8967. Both schedulers use the same placement manager and placement rule 
code. The rules are different for both schedulers as the FS uses a slightly 
different setup with rule chaining and creation of queues that do not exist.

The fix is in 3.3 and later: marking this as a duplicate of YARN-8967

> Add user to queue mappings to automatically place users' apps into specific 
> queues
> --
>
> Key: YARN-2257
> URL: https://issues.apache.org/jira/browse/YARN-2257
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Patrick Liu
>Assignee: Vinod Kumar Vavilapalli
>Priority: Major
>  Labels: features
>
> Currently, the fair-scheduler supports two modes, default queue or individual 
> queue for each user.
> Apparently, the default queue is not a good option, because the resources 
> cannot be managed for each user or group.
> However, individual queue for each user is not good enough. Especially when 
> connecting yarn with hive. There will be increasing hive users in a corporate 
> environment. If we create a queue for a user, the resource management will be 
> hard to maintain.
> I think the problem can be solved like this:
> 1. Define user->queue mapping in Fair-Scheduler.xml. Inside each queue, use 
> aclSubmitApps to control user's ability.
> 2. Each time a user submit an app to yarn, if the user has mapped to a queue, 
> the app will be scheduled to that queue; otherwise, the app will be submitted 
> to default queue.
> 3. If the user cannot pass aclSubmitApps limits, the app will not be accepted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9415) Document FS placement rule changes from YARN-8967

2019-03-27 Thread Wilfred Spiegelenburg (JIRA)
Wilfred Spiegelenburg created YARN-9415:
---

 Summary: Document FS placement rule changes from YARN-8967
 Key: YARN-9415
 URL: https://issues.apache.org/jira/browse/YARN-9415
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation, fairscheduler
Affects Versions: 3.3.0
Reporter: Wilfred Spiegelenburg
Assignee: Wilfred Spiegelenburg


With the changes introduced by YARN-8967 we now allow parent rules on all 
existing rules. This should be documented.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-6567) Flexible Workload Management

2019-03-27 Thread Wilfred Spiegelenburg (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-6567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg reassigned YARN-6567:
---

Assignee: Wilfred Spiegelenburg

> Flexible Workload Management
> 
>
> Key: YARN-6567
> URL: https://issues.apache.org/jira/browse/YARN-6567
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Ajai Omtri
>Assignee: Wilfred Spiegelenburg
>Priority: Minor
>  Labels: features
>
> Yarn workload management can be little more dynamic. 
> 1. Create yarn pool by specifying more than one Secondary AD group. 
> Scenario: 
> In a multi-tenant cluster there can be hundreds of AD groups per tenant and 
> hundreds of users per AD group. We want a way to group like workloads into 
> single yarn pool by specifying multiple secondary AD Groups. 
> Ex: All the ETL workloads of tenants needs to go into one yarn pool. This 
> requires addition of all ETL related AD groups into one yarn pool. 
> 2. Demotions
> Scenario: A particular workload/job has been started in a high priority yarn 
> pool based on the assumption that it would finish quickly but due to some 
> data issue/change in the code/query etc. - now it is running longer and 
> consuming high amounts of resources for long time. In this case we want to 
> demote this to a lower resource allocated yarn pool. We don’t want this one 
> run-away workload/job to dominate the cluster because our assumption was 
> wrong.
> Ex: If any workload in the yarn pool runs for X minutes and/or consumes Y 
> resources either alert me or push to another yarn pool. User can keep 
> demoting and can push to a yarn pool which has capped resources - like 
> Penalty box.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8967) Change FairScheduler to use PlacementRule interface

2019-03-26 Thread Wilfred Spiegelenburg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801415#comment-16801415
 ] 

Wilfred Spiegelenburg commented on YARN-8967:
-

Thank you [~yufeigu] for the commit.

> Change FairScheduler to use PlacementRule interface
> ---
>
> Key: YARN-8967
> URL: https://issues.apache.org/jira/browse/YARN-8967
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler, fairscheduler
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-8967.001.patch, YARN-8967.002.patch, 
> YARN-8967.003.patch, YARN-8967.004.patch, YARN-8967.005.patch, 
> YARN-8967.006.patch, YARN-8967.007.patch, YARN-8967.008.patch, 
> YARN-8967.009.patch, YARN-8967.010.patch, YARN-8967.011.patch
>
>
> The PlacementRule interface was introduced to be used by all schedulers as 
> per YARN-3635. The CapacityScheduler is using it but the FairScheduler is not 
> and is using its own rule definition.
> YARN-8948 cleans up the implementation and removes the CS references which 
> should allow this change to go through.
> This would be the first step in using one placement rule engine for both 
> schedulers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8967) Change FairScheduler to use PlacementRule interface

2019-03-22 Thread Wilfred Spiegelenburg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16799020#comment-16799020
 ] 

Wilfred Spiegelenburg commented on YARN-8967:
-

3) I need two pieces back from the child when we have a rule so that is what 
was hampering the simple move. I was also hesitant because of the possibility 
to add new child nodes, beside the parent rule, specifically for introducing 
filters on some of the rules. I think the use of a method for just retrieving 
the element is simple enough and does not hamper the changes I have been 
looking at.

5) yes they should have been private and final

Updated the patch with the two changes [^YARN-8967.011.patch] 

> Change FairScheduler to use PlacementRule interface
> ---
>
> Key: YARN-8967
> URL: https://issues.apache.org/jira/browse/YARN-8967
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler, fairscheduler
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Attachments: YARN-8967.001.patch, YARN-8967.002.patch, 
> YARN-8967.003.patch, YARN-8967.004.patch, YARN-8967.005.patch, 
> YARN-8967.006.patch, YARN-8967.007.patch, YARN-8967.008.patch, 
> YARN-8967.009.patch, YARN-8967.010.patch, YARN-8967.011.patch
>
>
> The PlacementRule interface was introduced to be used by all schedulers as 
> per YARN-3635. The CapacityScheduler is using it but the FairScheduler is not 
> and is using its own rule definition.
> YARN-8948 cleans up the implementation and removes the CS references which 
> should allow this change to go through.
> This would be the first step in using one placement rule engine for both 
> schedulers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8967) Change FairScheduler to use PlacementRule interface

2019-03-22 Thread Wilfred Spiegelenburg (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YARN-8967:

Attachment: YARN-8967.011.patch

> Change FairScheduler to use PlacementRule interface
> ---
>
> Key: YARN-8967
> URL: https://issues.apache.org/jira/browse/YARN-8967
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler, fairscheduler
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Attachments: YARN-8967.001.patch, YARN-8967.002.patch, 
> YARN-8967.003.patch, YARN-8967.004.patch, YARN-8967.005.patch, 
> YARN-8967.006.patch, YARN-8967.007.patch, YARN-8967.008.patch, 
> YARN-8967.009.patch, YARN-8967.010.patch, YARN-8967.011.patch
>
>
> The PlacementRule interface was introduced to be used by all schedulers as 
> per YARN-3635. The CapacityScheduler is using it but the FairScheduler is not 
> and is using its own rule definition.
> YARN-8948 cleans up the implementation and removes the CS references which 
> should allow this change to go through.
> This would be the first step in using one placement rule engine for both 
> schedulers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8967) Change FairScheduler to use PlacementRule interface

2019-03-21 Thread Wilfred Spiegelenburg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16798645#comment-16798645
 ] 

Wilfred Spiegelenburg commented on YARN-8967:
-

Junit test failure is not related.
The checkstyle is from this patch but it makes the internal class RuleMap so 
much simpler that I propose we leave it like it is. [~yufeigu]: the checkstyle 
was why I introduced the getters etc which is the basis for your comment number 
5) before.

> Change FairScheduler to use PlacementRule interface
> ---
>
> Key: YARN-8967
> URL: https://issues.apache.org/jira/browse/YARN-8967
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler, fairscheduler
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Attachments: YARN-8967.001.patch, YARN-8967.002.patch, 
> YARN-8967.003.patch, YARN-8967.004.patch, YARN-8967.005.patch, 
> YARN-8967.006.patch, YARN-8967.007.patch, YARN-8967.008.patch, 
> YARN-8967.009.patch, YARN-8967.010.patch
>
>
> The PlacementRule interface was introduced to be used by all schedulers as 
> per YARN-3635. The CapacityScheduler is using it but the FairScheduler is not 
> and is using its own rule definition.
> YARN-8948 cleans up the implementation and removes the CS references which 
> should allow this change to go through.
> This would be the first step in using one placement rule engine for both 
> schedulers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8967) Change FairScheduler to use PlacementRule interface

2019-03-21 Thread Wilfred Spiegelenburg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16797873#comment-16797873
 ] 

Wilfred Spiegelenburg commented on YARN-8967:
-

1) I missed that one too, fixed now
3) The two for loops run over different lists. Take this example:
{code}

  
  
   
  

{code}
The first for loop run over the top level list of nodes (entries: specified and 
nestedQueue). The second loop runs over the children of each entry in that 
list. You cannot see the children of the top level nodes until you call 
{{getChildren()}} on it. For that you need to cast the Node to an Element. I 
thus cannot collapse the loop into one loop. The list also does not have an 
iterator to change it to a for-each construct. The xml files also return child 
Nodes that are not of the Element type for a correct configuration which means 
we have to filter while traversing the list.
4) I added the same test case. We now handle that case and having no parent 
rule for the nestedUserQueue correctly.
I have great difficulty removing the create and init for the first rule as I 
don't know at the point that I find the first rule that I am going to find a 
second one. I would need to wait until after the loop to create/init which 
makes the code even more complex.
5) I had that to start with and changed it because the IDE kept complaining. 
Not sure why but it works now without complaints and without the getter 
methods. I might have had slightly different access modifiers. Looks far more 
like a wrapper class now.

I also found that we do not correctly test and handle the cases which have 
entries which are not _rules_. I updated the test cases and found that we had a 
possible NPE due to the way we process the policy. These ones are covered in 
{{testBrokenRules()}} and the updated tests in 
{{testNestedUserQueueParsingErrors()}}

> Change FairScheduler to use PlacementRule interface
> ---
>
> Key: YARN-8967
> URL: https://issues.apache.org/jira/browse/YARN-8967
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler, fairscheduler
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Attachments: YARN-8967.001.patch, YARN-8967.002.patch, 
> YARN-8967.003.patch, YARN-8967.004.patch, YARN-8967.005.patch, 
> YARN-8967.006.patch, YARN-8967.007.patch, YARN-8967.008.patch, 
> YARN-8967.009.patch, YARN-8967.010.patch
>
>
> The PlacementRule interface was introduced to be used by all schedulers as 
> per YARN-3635. The CapacityScheduler is using it but the FairScheduler is not 
> and is using its own rule definition.
> YARN-8948 cleans up the implementation and removes the CS references which 
> should allow this change to go through.
> This would be the first step in using one placement rule engine for both 
> schedulers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8967) Change FairScheduler to use PlacementRule interface

2019-03-21 Thread Wilfred Spiegelenburg (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YARN-8967:

Attachment: YARN-8967.010.patch

> Change FairScheduler to use PlacementRule interface
> ---
>
> Key: YARN-8967
> URL: https://issues.apache.org/jira/browse/YARN-8967
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler, fairscheduler
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Attachments: YARN-8967.001.patch, YARN-8967.002.patch, 
> YARN-8967.003.patch, YARN-8967.004.patch, YARN-8967.005.patch, 
> YARN-8967.006.patch, YARN-8967.007.patch, YARN-8967.008.patch, 
> YARN-8967.009.patch, YARN-8967.010.patch
>
>
> The PlacementRule interface was introduced to be used by all schedulers as 
> per YARN-3635. The CapacityScheduler is using it but the FairScheduler is not 
> and is using its own rule definition.
> YARN-8948 cleans up the implementation and removes the CS references which 
> should allow this change to go through.
> This would be the first step in using one placement rule engine for both 
> schedulers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9278) Shuffle nodes when selecting to be preempted nodes

2019-03-19 Thread Wilfred Spiegelenburg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16796823#comment-16796823
 ] 

Wilfred Spiegelenburg commented on YARN-9278:
-

Thank for the update [~uranus]

I don't think that we can add a lot of tests, as it would become really 
difficult to inspect the results. The three tests I can think of are:
* setting the batch value to 0 (-1 is the default and we do a tests with that).
* setting the value for the batch to something larger than the number of nodes 
to show that we do not run out of the node list and somehow fail
* setting the batch value to {{#NMs -1}} (i.e. batch=4 in a 5 node cluster) and 
do multiple runs. Even with bad random we should flip over the end of the list.

None of the tests should fail iterating nodes.

> Shuffle nodes when selecting to be preempted nodes
> --
>
> Key: YARN-9278
> URL: https://issues.apache.org/jira/browse/YARN-9278
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: YARN-9278.001.patch, YARN-9278.002.patch, 
> YARN-9278.003.patch
>
>
> We should *shuffle* the nodes to avoid some nodes being preempted frequently. 
> Also, we should *limit* the num of nodes to make preemption more efficient.
> Just like this,
> {code:java}
> // we should not iterate all nodes, that will be very slow
> long maxTryNodeNum = 
> context.getPreemptionConfig().getToBePreemptedNodeMaxNumOnce();
> if (potentialNodes.size() > maxTryNodeNum){
>   Collections.shuffle(potentialNodes);
>   List newPotentialNodes = new ArrayList();
> for (int i = 0; i < maxTryNodeNum; i++){
>   newPotentialNodes.add(potentialNodes.get(i));
> }
> potentialNodes = newPotentialNodes;
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8967) Change FairScheduler to use PlacementRule interface

2019-03-18 Thread Wilfred Spiegelenburg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16795640#comment-16795640
 ] 

Wilfred Spiegelenburg commented on YARN-8967:
-

Cleaned up the checkstyle issues and fixed the junit test failures.
Also removed a partial diff that crept in from YARN-9314.

> Change FairScheduler to use PlacementRule interface
> ---
>
> Key: YARN-8967
> URL: https://issues.apache.org/jira/browse/YARN-8967
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler, fairscheduler
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Attachments: YARN-8967.001.patch, YARN-8967.002.patch, 
> YARN-8967.003.patch, YARN-8967.004.patch, YARN-8967.005.patch, 
> YARN-8967.006.patch, YARN-8967.007.patch, YARN-8967.008.patch, 
> YARN-8967.009.patch
>
>
> The PlacementRule interface was introduced to be used by all schedulers as 
> per YARN-3635. The CapacityScheduler is using it but the FairScheduler is not 
> and is using its own rule definition.
> YARN-8948 cleans up the implementation and removes the CS references which 
> should allow this change to go through.
> This would be the first step in using one placement rule engine for both 
> schedulers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8967) Change FairScheduler to use PlacementRule interface

2019-03-18 Thread Wilfred Spiegelenburg (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YARN-8967:

Attachment: YARN-8967.009.patch

> Change FairScheduler to use PlacementRule interface
> ---
>
> Key: YARN-8967
> URL: https://issues.apache.org/jira/browse/YARN-8967
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler, fairscheduler
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Attachments: YARN-8967.001.patch, YARN-8967.002.patch, 
> YARN-8967.003.patch, YARN-8967.004.patch, YARN-8967.005.patch, 
> YARN-8967.006.patch, YARN-8967.007.patch, YARN-8967.008.patch, 
> YARN-8967.009.patch
>
>
> The PlacementRule interface was introduced to be used by all schedulers as 
> per YARN-3635. The CapacityScheduler is using it but the FairScheduler is not 
> and is using its own rule definition.
> YARN-8948 cleans up the implementation and removes the CS references which 
> should allow this change to go through.
> This would be the first step in using one placement rule engine for both 
> schedulers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8967) Change FairScheduler to use PlacementRule interface

2019-03-18 Thread Wilfred Spiegelenburg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16795049#comment-16795049
 ] 

Wilfred Spiegelenburg commented on YARN-8967:
-

Thank you for the review [~yufeigu]

1) yes it did clean up nicely
2) The class is marked as {{@ Unstable}} that should cover the change. Leaving 
the old constructors in could allow you to create a new 
{{AllocationFileLoaderService}} without a scheduler reference. That would cause 
a NPE on scheduler init and every single time the reload thread would run, 
leaving the RM in a failed state. I don't think it would be wise to leave them 
in. 
Based on all this I do think I need to file a follow up jira to fix the Hive 
SHIM that uses the policy at the moment and move that to the new code in a 
backward compatible way.
3) fixed that
4) fixed that
5) The difference between recovery and normal is just two if statements: in the 
first we ignore and empty context on recovery and the second one is to not 
generate an event on recovery. Moving the code out would not help. The checks 
are on opposite sides of the method and simple.
6) We could still have an empty queue that was why I left it. I just noticed 
that that case would be caught by the {{getLeafQueue}} so we should be OK with 
removing.
7) fixed that, it should have been removed

1) I have chosen to use the utility class solution and clean up a bit more. 
Keeping the QueuePlacementPolicy around in the allocation does not really help 
as the rules are really only relevant in the QueuePlacementManager in the new 
setup. There is no logic beside the rule list which is not 1:1 with the config 
that we could keep around.
2) fixed the reference (I used javadoc as there was nothing for other comments, 
now it is just a plain comment)
3) removed the comment and code
4) fixed
5) the tests look really similar but they are not. They test slight variations: 
the first two checks make sure the specified rule and create user rule trigger 
correctly. The last two make sure that the specified rule triggers but not the 
user rule and that the default rule does the catch it correctly.
6) fixed that, I left it at first with a view on possible extension later with 
other bits. I now moved the parent create code out and left the loop for 
elements which clears things up.
7) added a RuleMap class based on the suggestion
8) I think it is better to file a follow up jira as the same has happened in 
all new rule classes. We must have overlooked them in the previous jira when we 
did the cleanup. I checked and the exception is logged in the client service so 
it can be done.



> Change FairScheduler to use PlacementRule interface
> ---
>
> Key: YARN-8967
> URL: https://issues.apache.org/jira/browse/YARN-8967
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler, fairscheduler
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Attachments: YARN-8967.001.patch, YARN-8967.002.patch, 
> YARN-8967.003.patch, YARN-8967.004.patch, YARN-8967.005.patch, 
> YARN-8967.006.patch, YARN-8967.007.patch
>
>
> The PlacementRule interface was introduced to be used by all schedulers as 
> per YARN-3635. The CapacityScheduler is using it but the FairScheduler is not 
> and is using its own rule definition.
> YARN-8948 cleans up the implementation and removes the CS references which 
> should allow this change to go through.
> This would be the first step in using one placement rule engine for both 
> schedulers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8967) Change FairScheduler to use PlacementRule interface

2019-03-18 Thread Wilfred Spiegelenburg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16795049#comment-16795049
 ] 

Wilfred Spiegelenburg edited comment on YARN-8967 at 3/18/19 2:15 PM:
--

Thank you for the review [~yufeigu]

AllocationFileLoaderService file:
1) yes it did clean up nicely
2) The class is marked as {{@Unstable}} that should cover the change. Leaving 
the old constructors in could allow you to create a new 
{{AllocationFileLoaderService}} without a scheduler reference. That would cause 
a NPE on scheduler init and every single time the reload thread would run, 
leaving the RM in a failed state. I don't think it would be wise to leave them 
in.  _Based on all this I do think I need to file a follow up jira to fix the 
Hive SHIM that uses the policy at the moment and move that to the new code in a 
backward compatible way._
3) fixed that
4) fixed that
5) The difference between recovery and normal is just two if statements: in the 
first we ignore and empty context on recovery and the second one is to not 
generate an event on recovery. Moving the code out would not help. The checks 
are on opposite sides of the method and simple.
6) We could still have an empty queue that was why I left it. I just noticed 
that that case would be caught by the {{getLeafQueue}} so we should be OK with 
removing.
7) fixed that, it should have been removed

QueuePlacementPolicy file:
1) I have chosen to use the utility class solution and clean up a bit more. 
Keeping the QueuePlacementPolicy around in the allocation does not really help 
as the rules are really only relevant in the QueuePlacementManager in the new 
setup. There is no logic beside the rule list which is not 1:1 with the config 
that we could keep around.
2) fixed the reference (I used javadoc as there was nothing for other comments, 
now it is just a plain comment)
3) removed the comment and code
4) fixed
5) the tests look really similar but they are not. They test slight variations: 
the first two checks make sure the specified rule and create user rule trigger 
correctly. The last two make sure that the specified rule triggers but not the 
user rule and that the default rule does the catch it correctly.
6) fixed that, I left it at first with a view on possible extension later with 
other bits. I now moved the parent create code out and left the loop for 
elements which clears things up.
7) added a RuleMap class based on the suggestion
8) I think it is better to file a follow up jira as the same has happened in 
all new rule classes. We must have overlooked them in the previous jira when we 
did the cleanup. I checked and the exception is logged in the client service so 
it can be done.




was (Author: wilfreds):
Thank you for the review [~yufeigu]

1) yes it did clean up nicely
2) The class is marked as {{@ Unstable}} that should cover the change. Leaving 
the old constructors in could allow you to create a new 
{{AllocationFileLoaderService}} without a scheduler reference. That would cause 
a NPE on scheduler init and every single time the reload thread would run, 
leaving the RM in a failed state. I don't think it would be wise to leave them 
in. 
Based on all this I do think I need to file a follow up jira to fix the Hive 
SHIM that uses the policy at the moment and move that to the new code in a 
backward compatible way.
3) fixed that
4) fixed that
5) The difference between recovery and normal is just two if statements: in the 
first we ignore and empty context on recovery and the second one is to not 
generate an event on recovery. Moving the code out would not help. The checks 
are on opposite sides of the method and simple.
6) We could still have an empty queue that was why I left it. I just noticed 
that that case would be caught by the {{getLeafQueue}} so we should be OK with 
removing.
7) fixed that, it should have been removed

1) I have chosen to use the utility class solution and clean up a bit more. 
Keeping the QueuePlacementPolicy around in the allocation does not really help 
as the rules are really only relevant in the QueuePlacementManager in the new 
setup. There is no logic beside the rule list which is not 1:1 with the config 
that we could keep around.
2) fixed the reference (I used javadoc as there was nothing for other comments, 
now it is just a plain comment)
3) removed the comment and code
4) fixed
5) the tests look really similar but they are not. They test slight variations: 
the first two checks make sure the specified rule and create user rule trigger 
correctly. The last two make sure that the specified rule triggers but not the 
user rule and that the default rule does the catch it correctly.
6) fixed that, I left it at first with a view on possible extension later with 
other bits. I now moved the parent create code out and left the loop for 
elements which clears things up.
7) added a RuleMap class 

[jira] [Updated] (YARN-8967) Change FairScheduler to use PlacementRule interface

2019-03-18 Thread Wilfred Spiegelenburg (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YARN-8967:

Attachment: YARN-8967.008.patch

> Change FairScheduler to use PlacementRule interface
> ---
>
> Key: YARN-8967
> URL: https://issues.apache.org/jira/browse/YARN-8967
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler, fairscheduler
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Attachments: YARN-8967.001.patch, YARN-8967.002.patch, 
> YARN-8967.003.patch, YARN-8967.004.patch, YARN-8967.005.patch, 
> YARN-8967.006.patch, YARN-8967.007.patch, YARN-8967.008.patch
>
>
> The PlacementRule interface was introduced to be used by all schedulers as 
> per YARN-3635. The CapacityScheduler is using it but the FairScheduler is not 
> and is using its own rule definition.
> YARN-8948 cleans up the implementation and removes the CS references which 
> should allow this change to go through.
> This would be the first step in using one placement rule engine for both 
> schedulers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9278) Shuffle nodes when selecting to be preempted nodes

2019-03-17 Thread Wilfred Spiegelenburg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16794653#comment-16794653
 ] 

Wilfred Spiegelenburg commented on YARN-9278:
-

Thank you for the update. [~uranus] The code change looks good: a couple of 
minor remarks:
* We need to get either add some tests or explain why we cannot add tests.
* Can you fix the newly introduced checkstyle issues please.
* The text for the property is much better however this part does not make 
sense to me:
{{The max trial nodes num to identify containers for one starved container}}
I think you want to say:
{{The maximum number of NodeManagers to check per pre-emption check for one 
starved container.}}

> Shuffle nodes when selecting to be preempted nodes
> --
>
> Key: YARN-9278
> URL: https://issues.apache.org/jira/browse/YARN-9278
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: YARN-9278.001.patch, YARN-9278.002.patch
>
>
> We should *shuffle* the nodes to avoid some nodes being preempted frequently. 
> Also, we should *limit* the num of nodes to make preemption more efficient.
> Just like this,
> {code:java}
> // we should not iterate all nodes, that will be very slow
> long maxTryNodeNum = 
> context.getPreemptionConfig().getToBePreemptedNodeMaxNumOnce();
> if (potentialNodes.size() > maxTryNodeNum){
>   Collections.shuffle(potentialNodes);
>   List newPotentialNodes = new ArrayList();
> for (int i = 0; i < maxTryNodeNum; i++){
>   newPotentialNodes.add(potentialNodes.get(i));
> }
> potentialNodes = newPotentialNodes;
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8967) Change FairScheduler to use PlacementRule interface

2019-03-12 Thread Wilfred Spiegelenburg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791169#comment-16791169
 ] 

Wilfred Spiegelenburg commented on YARN-8967:
-

[~haibochen] or [~templedf] could either of you review this please?

> Change FairScheduler to use PlacementRule interface
> ---
>
> Key: YARN-8967
> URL: https://issues.apache.org/jira/browse/YARN-8967
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler, fairscheduler
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Attachments: YARN-8967.001.patch, YARN-8967.002.patch, 
> YARN-8967.003.patch, YARN-8967.004.patch, YARN-8967.005.patch, 
> YARN-8967.006.patch, YARN-8967.007.patch
>
>
> The PlacementRule interface was introduced to be used by all schedulers as 
> per YARN-3635. The CapacityScheduler is using it but the FairScheduler is not 
> and is using its own rule definition.
> YARN-8948 cleans up the implementation and removes the CS references which 
> should allow this change to go through.
> This would be the first step in using one placement rule engine for both 
> schedulers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9278) Shuffle nodes when selecting to be preempted nodes

2019-03-12 Thread Wilfred Spiegelenburg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16790501#comment-16790501
 ] 

Wilfred Spiegelenburg commented on YARN-9278:
-

I have a couple of comments on the patch.
* I think we need a default that turns this functionality off by default. An 
administrator should actively turn this on. A default of -1 or 0 is better and 
we should skip over the calculation for that value.
* Your code does not wrap at the end of the list. That is why I changed it to a 
do while loop. as an example: I have a batch size of 100 and I have 350 nodes. 
I could start at node 300 and still want to check those 100 nodes and want to 
check nodes 300-350 and 0-49. We should never double check a node, never go 
past the start. I updated the example code with a wrapping end calculation.
* We need to cater for a setup that has the batch value set and a node list, 
for whatever reason or point in time, is smaller than the batch size. Your code 
does not handle this. We should just process all nodes at that point and not 
stop at the end of the list. 
* The text for the property is not clear at all. (See below)
* Please look at adding a test for this change.

Text for the property remarks:
{code}
The max trial nodes num to identify containers for one starved container. 
Defaults to 0.
{code}
It does not explain what it does and why it is there. It is too cryptic. We 
should explain the following:
* what is it used for (use a partial list of nodes to check for preemptable 
containers in a large cluster)
* when should it be used, or when it takes effect.
* what is the impact: AM container impact, 

> Shuffle nodes when selecting to be preempted nodes
> --
>
> Key: YARN-9278
> URL: https://issues.apache.org/jira/browse/YARN-9278
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: YARN-9278.001.patch
>
>
> We should *shuffle* the nodes to avoid some nodes being preempted frequently. 
> Also, we should *limit* the num of nodes to make preemption more efficient.
> Just like this,
> {code:java}
> // we should not iterate all nodes, that will be very slow
> long maxTryNodeNum = 
> context.getPreemptionConfig().getToBePreemptedNodeMaxNumOnce();
> if (potentialNodes.size() > maxTryNodeNum){
>   Collections.shuffle(potentialNodes);
>   List newPotentialNodes = new ArrayList();
> for (int i = 0; i < maxTryNodeNum; i++){
>   newPotentialNodes.add(potentialNodes.get(i));
> }
> potentialNodes = newPotentialNodes;
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9344) FS should not reserve when container capability is bigger than node total resource

2019-03-12 Thread Wilfred Spiegelenburg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16790510#comment-16790510
 ] 

Wilfred Spiegelenburg commented on YARN-9344:
-

Thank you [~uranus] for adding the test.
Could you clean up the test a bit? There is an unused queue in the config and 
the {{}} does distract from the real test and it should work 
without it.
The asserts from the junit test are much clearer when they have a message that 
is printed when the test fails example:
{code}
assertEquals("Application has live containers and it should have none",
0, scheduler.getSchedulerApp(attId1).getLiveContainers().size());
{code}

The test does only check memory, we should also cover other resource types in 
the test not just the memory resource (vcores, custom resource types).

> FS should not reserve when container capability is bigger than node total 
> resource
> --
>
> Key: YARN-9344
> URL: https://issues.apache.org/jira/browse/YARN-9344
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: YARN-9344.001.patch, YARN-9344.002.patch, 
> YARN-9344.003.patch, YARN-9344.004.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9278) Shuffle nodes when selecting to be preempted nodes

2019-03-12 Thread Wilfred Spiegelenburg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16780101#comment-16780101
 ] 

Wilfred Spiegelenburg edited comment on YARN-9278 at 3/12/19 12:22 PM:
---

Two things:
* I still think limiting the number of nodes is something we need to approach 
with care.
* randomising a 10,000 entry long list each time we pre-empt will also become 
expensive.
 
I was thinking more of something like this:
{code:java}
  int preEmptionBatchSize = conf.getPreEmptionBatchSize();
  List potentialNodes = 
scheduler.getNodeTracker().getNodesByResourceName(rr.getResourceName());
  int size = potentialNodes.size();
  int stop = 0;
  int current = 0;
  // find a start point somewhere in the list if it is long
  if (size > preEmptionBatchSize) {
Random rand = new Random();
current = rand.nextInt(size / preEmptionBatchSize) * preEmptionBatchSize;
stop = current;
  }
  do {
FSSchedulerNode mine = potentialNodes.get(current);
// Identify the containers

current++;
// flip at the end of the list  
if (current > size) {
  current = 0;
}
  } while (current != stop);
{code}

Pre-emption runs in a loop and we could be considering different applications 
one after the other. Shuffling that node list continually is not good from a 
performance perspective. A simple cut in like above gives the same kind of 
behaviour. 
We could then still limit the number of "batches" we process. With some more 
smarts the stop condition could be based on the fact that we have processed as 
an example 10 * the batch size in nodes (a batch of nodes could be deemed 
equivalent with the number of nodes in a rack):
{code}  stop = ((10 * preEmptionBatchSize) > size) ? current : (((10 * 
preEmptionBatchSize) + current) % size););
{code}  

That gives a lot of flexibility and still a decent performance in a large 
cluster.


was (Author: wilfreds):
Two things:
* I still think limiting the number of nodes is something we need to approach 
with care.
* randomising a 10,000 entry long list each time we pre-empt will also become 
expensive.
 
I was thinking more of something like this:
{code:java}
  int preEmptionBatchSize = conf.getPreEmptionBatchSize();
  List potentialNodes = 
scheduler.getNodeTracker().getNodesByResourceName(rr.getResourceName());
  int size = potentialNodes.size();
  int stop = 0;
  int current = 0;
  // find a start point somewhere in the list if it is long
  if (size > preEmptionBatchSize) {
Random rand = new Random();
current = rand.nextInt(size / preEmptionBatchSize) * preEmptionBatchSize;
  }
  do {
FSSchedulerNode mine = potentialNodes.get(current);
// Identify the containers

current++;
// flip at the end of the list  
if (current > size) {
  current = 0;
}
  } while (current != stop);
{code}

Pre-emption runs in a loop and we could be considering different applications 
one after the other. Shuffling that node list continually is not good from a 
performance perspective. A simple cut in like above gives the same kind of 
behaviour. 
We could then still limit the number of "batches" we process. With some more 
smarts the stop condition could be based on the fact that we have processed as 
an example 10 * the batch size in nodes (a batch of nodes could be deemed 
equivalent with the number of nodes in a rack):
{code}  stop = ((10 * preEmptionBatchSize) > size) ? current : (((10 * 
preEmptionBatchSize) + current) % size););
{code}  

That gives a lot of flexibility and still a decent performance in a large 
cluster.

> Shuffle nodes when selecting to be preempted nodes
> --
>
> Key: YARN-9278
> URL: https://issues.apache.org/jira/browse/YARN-9278
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: YARN-9278.001.patch
>
>
> We should *shuffle* the nodes to avoid some nodes being preempted frequently. 
> Also, we should *limit* the num of nodes to make preemption more efficient.
> Just like this,
> {code:java}
> // we should not iterate all nodes, that will be very slow
> long maxTryNodeNum = 
> context.getPreemptionConfig().getToBePreemptedNodeMaxNumOnce();
> if (potentialNodes.size() > maxTryNodeNum){
>   Collections.shuffle(potentialNodes);
>   List newPotentialNodes = new ArrayList();
> for (int i = 0; i < maxTryNodeNum; i++){
>   newPotentialNodes.add(potentialNodes.get(i));
> }
> potentialNodes = newPotentialNodes;
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: 

[jira] [Comment Edited] (YARN-9278) Shuffle nodes when selecting to be preempted nodes

2019-03-12 Thread Wilfred Spiegelenburg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16780101#comment-16780101
 ] 

Wilfred Spiegelenburg edited comment on YARN-9278 at 3/12/19 12:25 PM:
---

Two things:
* I still think limiting the number of nodes is something we need to approach 
with care.
* randomising a 10,000 entry long list each time we pre-empt will also become 
expensive.
 
I was thinking more of something like this:
{code:java}
  int preEmptionBatchSize = conf.getPreEmptionBatchSize();
  List potentialNodes = 
scheduler.getNodeTracker().getNodesByResourceName(rr.getResourceName());
  int size = potentialNodes.size();
  int stop = 0;
  int current = 0;
  // find a start point somewhere in the list if it is long
  if (size > preEmptionBatchSize) {
Random rand = new Random();
current = rand.nextInt(size / preEmptionBatchSize) * preEmptionBatchSize;
stop = (preEmptionBatchSize > size) ? current : ((current + 
preEmptionBatchSize) % size);
  }
  do {
FSSchedulerNode mine = potentialNodes.get(current);
// Identify the containers

current++;
// flip at the end of the list  
if (current > size) {
  current = 0;
}
  } while (current != stop);
{code}

Pre-emption runs in a loop and we could be considering different applications 
one after the other. Shuffling that node list continually is not good from a 
performance perspective. A simple cut in like above gives the same kind of 
behaviour. 
We could then still limit the number of "batches" we process. With some more 
smarts the stop condition could be based on the fact that we have processed as 
an example 10 * the batch size in nodes (a batch of nodes could be deemed 
equivalent with the number of nodes in a rack):
{code}  stop = ((10 * preEmptionBatchSize) > size) ? current : (((10 * 
preEmptionBatchSize) + current) % size););
{code}  

That gives a lot of flexibility and still a decent performance in a large 
cluster.


was (Author: wilfreds):
Two things:
* I still think limiting the number of nodes is something we need to approach 
with care.
* randomising a 10,000 entry long list each time we pre-empt will also become 
expensive.
 
I was thinking more of something like this:
{code:java}
  int preEmptionBatchSize = conf.getPreEmptionBatchSize();
  List potentialNodes = 
scheduler.getNodeTracker().getNodesByResourceName(rr.getResourceName());
  int size = potentialNodes.size();
  int stop = 0;
  int current = 0;
  // find a start point somewhere in the list if it is long
  if (size > preEmptionBatchSize) {
Random rand = new Random();
current = rand.nextInt(size / preEmptionBatchSize) * preEmptionBatchSize;
stop = current;
  }
  do {
FSSchedulerNode mine = potentialNodes.get(current);
// Identify the containers

current++;
// flip at the end of the list  
if (current > size) {
  current = 0;
}
  } while (current != stop);
{code}

Pre-emption runs in a loop and we could be considering different applications 
one after the other. Shuffling that node list continually is not good from a 
performance perspective. A simple cut in like above gives the same kind of 
behaviour. 
We could then still limit the number of "batches" we process. With some more 
smarts the stop condition could be based on the fact that we have processed as 
an example 10 * the batch size in nodes (a batch of nodes could be deemed 
equivalent with the number of nodes in a rack):
{code}  stop = ((10 * preEmptionBatchSize) > size) ? current : (((10 * 
preEmptionBatchSize) + current) % size););
{code}  

That gives a lot of flexibility and still a decent performance in a large 
cluster.

> Shuffle nodes when selecting to be preempted nodes
> --
>
> Key: YARN-9278
> URL: https://issues.apache.org/jira/browse/YARN-9278
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: YARN-9278.001.patch
>
>
> We should *shuffle* the nodes to avoid some nodes being preempted frequently. 
> Also, we should *limit* the num of nodes to make preemption more efficient.
> Just like this,
> {code:java}
> // we should not iterate all nodes, that will be very slow
> long maxTryNodeNum = 
> context.getPreemptionConfig().getToBePreemptedNodeMaxNumOnce();
> if (potentialNodes.size() > maxTryNodeNum){
>   Collections.shuffle(potentialNodes);
>   List newPotentialNodes = new ArrayList();
> for (int i = 0; i < maxTryNodeNum; i++){
>   newPotentialNodes.add(potentialNodes.get(i));
> }
> potentialNodes = newPotentialNodes;
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To 

[jira] [Commented] (YARN-9314) Fair Scheduler: Queue Info mistake when configured same queue name at same level

2019-03-07 Thread Wilfred Spiegelenburg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16786734#comment-16786734
 ] 

Wilfred Spiegelenburg commented on YARN-9314:
-

Hi [~fengyongshe], thank you for filing this and providing a patch.

I have a couple comments:
* The text in the exception needs clarification:
{{queuename (" + queueName + ") repeated defining in Allocation File}}
something like this is clearer:
{{queue name (" + queueName + ") is defined multiple times, queues can only be 
defined once.}}
* The {{exists}} method can be simplified:
{code}
public boolean exists(String queueName) {
  for (FSQueueType queueType : FSQueueType.values()) {
if (configuredQueues.get(queueType).contains(queueName)) {
  return true;
}
  }
  return false;
}
{code}
* instead of checking the text of the message in the exception it is better to 
use the {{(expected = AllocationConfigurationException.class)}} on the test. If 
we change the text the test would still pass making maintenance easier. We 
already do that in a number of tests like {{testQueueAlongsideRoot}} as an 
example.
* the patch introduces a number of new checkstyle issues which should be fixed.

> Fair Scheduler: Queue Info mistake when configured same queue name at same 
> level
> 
>
> Key: YARN-9314
> URL: https://issues.apache.org/jira/browse/YARN-9314
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: fengyongshe
>Priority: Major
> Attachments: Fair Scheduler Mistake when configured same queue at 
> same level.png, YARN-9341.patch
>
>
> The Queue Info is configured in fair-scheduler.xml like below
> 
>       {color:#ff}{color}
>           3072mb,3vcores
>          4096mb,4vcores
>           
>                1024mb,1vcores
>               2048mb,2vcores
>                Charlie
>            
>        
>       {color:#ff}{color}
>            1024mb,1vcores
>            2048mb,2vcores
>        
>  
> {color:#33}The Queue root.deva configured last will override existing 
> root.deva{color}{color:#33} in root.deva.sample, like the 
> {color}attachment 
>  
>   root.deva
> ||Used Resources:||
> ||Min Resources:|.  => should be <3072mb,3vcores>|
> ||Max Resources:|.  => should be <4096mb,4vcores>|
> ||Reserved Resources:||
> ||Steady Fair Share:||
> ||Instantaneous Fair Share:||
>  
> root.deva.sample
> ||Min Resources:||
> ||Max Resources:||
> ||Reserved Resources:||
> ||Steady Fair Share:||
>      
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9343) Replace isDebugEnabled with SLF4J parameterized log messages

2019-03-07 Thread Wilfred Spiegelenburg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16786572#comment-16786572
 ] 

Wilfred Spiegelenburg commented on YARN-9343:
-

yes I am fine with that. This patch is big enough to leave it like this.

I did not see any issues beside the ones to open new jiras for in the latest 
patch +1 (non binding) 

> Replace isDebugEnabled with SLF4J parameterized log messages
> 
>
> Key: YARN-9343
> URL: https://issues.apache.org/jira/browse/YARN-9343
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9343-001.patch, YARN-9343-002.patch, 
> YARN-9343-003.patch
>
>
> Replace isDebugEnabled with SLF4J parameterized log messages. 
> https://www.slf4j.org/faq.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9343) Replace isDebugEnabled with SLF4J parameterized log messages

2019-03-06 Thread Wilfred Spiegelenburg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16786336#comment-16786336
 ] 

Wilfred Spiegelenburg edited comment on YARN-9343 at 3/7/19 3:28 AM:
-

Thank you for the update [~Prabhu Joseph]

I can see that we still have 200+ {{LOG.isDebugEnabled()}} calls in the code. 
two things:
# There are a lot of simple one parameter calls which could easily be converted 
to unguarded calls, examples:
** NvidiaDockerV1CommandPlugin.java
** FSParentQueue.java
** Application.java
# Some of the calls to {{LOG.debug}} that are guarded inside those guards have 
not been changed to parameterised calls yet. Do you want to file a followup 
jira for that or should that also be part of these changes?


was (Author: wilfreds):
Thank you for the update [~Prabhu Joseph]

I can see that we still have 200+ {{LOG.isDebugEnabled()}} calls in the code. 
two things:
# There are a lot of simple one parameter calls which could easily be converted 
to unguarded calls, examples:
* NvidiaDockerV1CommandPlugin.java
* FSParentQueue.java
* Application.java
# Some of the calls to {{LOG.debug}} that are guarded inside those have not 
been changed to parameterised calls yet. Do you want to file a followup jira 
for that or should that also be part of these changes?

> Replace isDebugEnabled with SLF4J parameterized log messages
> 
>
> Key: YARN-9343
> URL: https://issues.apache.org/jira/browse/YARN-9343
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9343-001.patch, YARN-9343-002.patch, 
> YARN-9343-003.patch
>
>
> Replace isDebugEnabled with SLF4J parameterized log messages. 
> https://www.slf4j.org/faq.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9343) Replace isDebugEnabled with SLF4J parameterized log messages

2019-03-06 Thread Wilfred Spiegelenburg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16786336#comment-16786336
 ] 

Wilfred Spiegelenburg commented on YARN-9343:
-

Thank you for the update [~Prabhu Joseph]

I can see that we still have 200+ {{LOG.isDebugEnabled()}} calls in the code. 
two things:
# There are a lot of simple one parameter calls which could easily be converted 
to unguarded calls, examples:
* NvidiaDockerV1CommandPlugin.java
* FSParentQueue.java
* Application.java
# Some of the calls to {{LOG.debug}} that are guarded inside those have not 
been changed to parameterised calls yet. Do you want to file a followup jira 
for that or should that also be part of these changes?

> Replace isDebugEnabled with SLF4J parameterized log messages
> 
>
> Key: YARN-9343
> URL: https://issues.apache.org/jira/browse/YARN-9343
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9343-001.patch, YARN-9343-002.patch, 
> YARN-9343-003.patch
>
>
> Replace isDebugEnabled with SLF4J parameterized log messages. 
> https://www.slf4j.org/faq.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9344) FS should not reserve when container capability is bigger than node total resource

2019-03-06 Thread Wilfred Spiegelenburg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16786314#comment-16786314
 ] 

Wilfred Spiegelenburg commented on YARN-9344:
-

The test failures are also not related: TestApplicationMasterServiceFair failed 
because it ran with the CapacityScheduler... Not sure what happened there.

[~uranus] This change should be easily testable in a junit test. We should not 
have a -1 from test4tests.
 Can you please add tests to TestFSAppAttempt to make sure that this is working 
as expected?

> FS should not reserve when container capability is bigger than node total 
> resource
> --
>
> Key: YARN-9344
> URL: https://issues.apache.org/jira/browse/YARN-9344
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: YARN-9344.001.patch, YARN-9344.002.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8967) Change FairScheduler to use PlacementRule interface

2019-03-06 Thread Wilfred Spiegelenburg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16786306#comment-16786306
 ] 

Wilfred Spiegelenburg commented on YARN-8967:
-

Fixed the newly introduced checkstyle issues. The build should now not have any 
white space issues anymore.
Test failures are not related to the patch, uploading patch 007.

> Change FairScheduler to use PlacementRule interface
> ---
>
> Key: YARN-8967
> URL: https://issues.apache.org/jira/browse/YARN-8967
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler, fairscheduler
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Attachments: YARN-8967.001.patch, YARN-8967.002.patch, 
> YARN-8967.003.patch, YARN-8967.004.patch, YARN-8967.005.patch, 
> YARN-8967.006.patch, YARN-8967.007.patch
>
>
> The PlacementRule interface was introduced to be used by all schedulers as 
> per YARN-3635. The CapacityScheduler is using it but the FairScheduler is not 
> and is using its own rule definition.
> YARN-8948 cleans up the implementation and removes the CS references which 
> should allow this change to go through.
> This would be the first step in using one placement rule engine for both 
> schedulers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8967) Change FairScheduler to use PlacementRule interface

2019-03-06 Thread Wilfred Spiegelenburg (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YARN-8967:

Attachment: YARN-8967.007.patch

> Change FairScheduler to use PlacementRule interface
> ---
>
> Key: YARN-8967
> URL: https://issues.apache.org/jira/browse/YARN-8967
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler, fairscheduler
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Attachments: YARN-8967.001.patch, YARN-8967.002.patch, 
> YARN-8967.003.patch, YARN-8967.004.patch, YARN-8967.005.patch, 
> YARN-8967.006.patch, YARN-8967.007.patch
>
>
> The PlacementRule interface was introduced to be used by all schedulers as 
> per YARN-3635. The CapacityScheduler is using it but the FairScheduler is not 
> and is using its own rule definition.
> YARN-8948 cleans up the implementation and removes the CS references which 
> should allow this change to go through.
> This would be the first step in using one placement rule engine for both 
> schedulers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9326) Fair Scheduler configuration defaults are not documented in case of min and maxResources

2019-03-06 Thread Wilfred Spiegelenburg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16786290#comment-16786290
 ] 

Wilfred Spiegelenburg commented on YARN-9326:
-

The white space issues are fixed via YARN-9348. A new build should not show 
them anymore,

The text looks good to me now. [~templedf] you did a lot of the work around 
resource types. Does this change look good to you from that perspective or 
should we extend the new format examples with a resource type tag like this to 
make it really clear:
{code}
"vcores=X, memory-mb=Y, GPU=5"
{code}

> Fair Scheduler configuration defaults are not documented in case of min and 
> maxResources
> 
>
> Key: YARN-9326
> URL: https://issues.apache.org/jira/browse/YARN-9326
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: docs, documentation, fairscheduler, yarn
>Affects Versions: 3.2.0
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
> Attachments: YARN-9326.001.patch, YARN-9326.002.patch, 
> YARN-9326.003.patch, YARN-9326.004.patch, YARN-9326.005.patch
>
>
> The FairScheduler's configuration has the following defaults (from the code: 
> javadoc):
> {noformat}
> In new style resources, any resource that is not specified will be set to 
> missing or 0%, as appropriate. Also, in the new style resources, units are 
> not allowed. Units are assumed from the resource manager's settings for the 
> resources when the value isn't a percentage. The missing parameter is only 
> used in the case of new style resources without percentages. With new style 
> resources with percentages, any missing resources will be assumed to be 100% 
> because percentages are only used with maximum resource limits.
> {noformat}
> This is not documented in the hadoop yarn site FairScheduler.html. It is 
> quite intuitive, but still need to be documented though.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8967) Change FairScheduler to use PlacementRule interface

2019-03-06 Thread Wilfred Spiegelenburg (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YARN-8967:

Attachment: (was: YARN-8967.006.patch)

> Change FairScheduler to use PlacementRule interface
> ---
>
> Key: YARN-8967
> URL: https://issues.apache.org/jira/browse/YARN-8967
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler, fairscheduler
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Attachments: YARN-8967.001.patch, YARN-8967.002.patch, 
> YARN-8967.003.patch, YARN-8967.004.patch, YARN-8967.005.patch, 
> YARN-8967.006.patch
>
>
> The PlacementRule interface was introduced to be used by all schedulers as 
> per YARN-3635. The CapacityScheduler is using it but the FairScheduler is not 
> and is using its own rule definition.
> YARN-8948 cleans up the implementation and removes the CS references which 
> should allow this change to go through.
> This would be the first step in using one placement rule engine for both 
> schedulers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8967) Change FairScheduler to use PlacementRule interface

2019-03-06 Thread Wilfred Spiegelenburg (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YARN-8967:

Attachment: YARN-8967.006.patch

> Change FairScheduler to use PlacementRule interface
> ---
>
> Key: YARN-8967
> URL: https://issues.apache.org/jira/browse/YARN-8967
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler, fairscheduler
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Attachments: YARN-8967.001.patch, YARN-8967.002.patch, 
> YARN-8967.003.patch, YARN-8967.004.patch, YARN-8967.005.patch, 
> YARN-8967.006.patch, YARN-8967.006.patch
>
>
> The PlacementRule interface was introduced to be used by all schedulers as 
> per YARN-3635. The CapacityScheduler is using it but the FairScheduler is not 
> and is using its own rule definition.
> YARN-8948 cleans up the implementation and removes the CS references which 
> should allow this change to go through.
> This would be the first step in using one placement rule engine for both 
> schedulers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9343) Replace isDebugEnabled with SLF4J parameterized log messages

2019-03-05 Thread Wilfred Spiegelenburg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16785182#comment-16785182
 ] 

Wilfred Spiegelenburg commented on YARN-9343:
-

Hi [~Prabhu Joseph] I see a lot of changes between patch 1 and patch 2. Patch 2 
contains about 50 more files that has changed. Can you explain what was done? 
I see that there are a large number of new files in patch 2 but I also miss 
some files that were in patch 1 in patch2:
* 55 new files in patch 2
* 5 files removed from patch 2


> Replace isDebugEnabled with SLF4J parameterized log messages
> 
>
> Key: YARN-9343
> URL: https://issues.apache.org/jira/browse/YARN-9343
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9343-001.patch, YARN-9343-002.patch
>
>
> Replace isDebugEnabled with SLF4J parameterized log messages. 
> https://www.slf4j.org/faq.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9298) Implement FS placement rules using PlacementRule interface

2019-03-05 Thread Wilfred Spiegelenburg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16785164#comment-16785164
 ] 

Wilfred Spiegelenburg commented on YARN-9298:
-

Thank you [~yufeigu] I will follow up with the real integration as part of 
YARN-8967.

> Implement FS placement rules using PlacementRule interface
> --
>
> Key: YARN-9298
> URL: https://issues.apache.org/jira/browse/YARN-9298
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9298.001.patch, YARN-9298.002.patch, 
> YARN-9298.003.patch, YARN-9298.004.patch, YARN-9298.005.patch, 
> YARN-9298.006.patch
>
>
> Implement existing placement rules of the FS using the PlacementRule 
> interface.
> Preparation for YARN-8967



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9341) Reentrant lock() before try

2019-03-05 Thread Wilfred Spiegelenburg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16785162#comment-16785162
 ] 

Wilfred Spiegelenburg commented on YARN-9341:
-

So we do have one and also one {{lockInterruptibly()}} in another part. The 
change as proposed by [~Prabhu Joseph] have left those two unchanged, neither 
are covered under the description of the jira either. It just talks about the 
{{lock()}} cases.

The only replacements that have been made are the direct calls to {{lock()}} in 
the patch neither of the two other ones have been touched. That is where I 
based my +1 on

> Reentrant lock() before try
> ---
>
> Key: YARN-9341
> URL: https://issues.apache.org/jira/browse/YARN-9341
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.1.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Minor
> Attachments: YARN-9341-001.patch
>
>
> As a best practice - Reentrant lock has to be acquired before try clause. 
> https://stackoverflow.com/questions/31058681/java-locking-structure-best-pattern
> There are many places where lock is obtained inside try.
> {code}
> try {
>this.writeLock.lock();
>   
> } finally {
>   this.writeLock.unlock();
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9344) FS should not reserve when container capability is bigger than node total resource

2019-03-05 Thread Wilfred Spiegelenburg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16785103#comment-16785103
 ] 

Wilfred Spiegelenburg commented on YARN-9344:
-

HI [~uranus] yes this is a problem nice catch.

However don't we have a more generic problem with the fact that we offer the 
node to this application attempt at all? The reservation is one thing but I 
think we should shortcut this assignment completely. If the specific request 
does not fit at all we need to move to the next request for the application 
attempt.
That would mean we need to move it one call up into the 
{{assignContainer(FSSchedulerNode node, boolean reserved)}} instead of where it 
is now.

Does that make sense to you?

> FS should not reserve when container capability is bigger than node total 
> resource
> --
>
> Key: YARN-9344
> URL: https://issues.apache.org/jira/browse/YARN-9344
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: YARN-9344.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9341) Reentrant lock() before try

2019-03-04 Thread Wilfred Spiegelenburg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783987#comment-16783987
 ] 

Wilfred Spiegelenburg commented on YARN-9341:
-

An {{IllegalMonitorStateException}} can only happen on unlock if the current 
thread is not the owner of the lock.
I don't think we use {{tryLock}} or {{lockInterruptibly}} anywhere in our code 
and thus do not need to worry about the {{IllegalMonitorStateException}}. When 
you call lock the thread is blocked until the point you acquire the lock. We 
should thus never proceed beyond the lock line and the finally clause should 
never be executed until after the thread has lock.

The change proposed is even following the java [API 
doc|https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/locks/ReentrantLock.html]
 for the locking.

+1 (non binding)

> Reentrant lock() before try
> ---
>
> Key: YARN-9341
> URL: https://issues.apache.org/jira/browse/YARN-9341
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.1.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Minor
> Attachments: YARN-9341-001.patch
>
>
> As a best practice - Reentrant lock has to be acquired before try clause. 
> https://stackoverflow.com/questions/31058681/java-locking-structure-best-pattern
> There are many places where lock is obtained inside try.
> {code}
> try {
>this.writeLock.lock();
>   
> } finally {
>   this.writeLock.unlock();
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9343) Replace isDebugEnabled with SLF4J parameterized log messages

2019-03-04 Thread Wilfred Spiegelenburg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783952#comment-16783952
 ] 

Wilfred Spiegelenburg commented on YARN-9343:
-

Hi [~Prabhu Joseph], thank you for this update. I have looked at it a couple of 
times and just updated the parts that I touched. It is good to have this done 
globally.

I do have some remarks:
* I saw an inconsistency in how we log exception. In some places we use 
{{debug(ex.getMessage());}} while in other we just use {{debug({}, ex);}} would 
be good to come to a standard way of logging them.
* Again for consistency sake: in the case that we just log the exception it 
would be nice to add that to the message text itself so we know that it is 
ignored, we do it in a number of places but not everywhere.
* In {{CombinedResourceCalculator}} we have two consecutive LOG.debug 
statements in the diff, only one is replaced.
* Do we need to use {{String.valueOf(pullImageTimeMs)}} in 
{{DockerLinuxContainerRuntime}} can we not just pass the object?
 * In {{ResourceLocalizationService}} you have missed a object reference in the 
text:
{code:java}
 LOG.debug("Skip downloading resource: {} since it's in"
+ " state: ", key, rsrc.getState());
{code}
* In {{AmIpFilter}} you have removed the guard but not changed the format 
string etc.
{code}
LOG.debug("Could not find " + WebAppProxyServlet.PROXY_USER_COOKIE_NAME
   + " cookie, so user will not be set");
{code}

I saw a couple of cases in which we are doing expensive operations in preparing 
the objects just for logging. Should we not keep the guard around them to 
prevent the overhead:
* TimelineUtils.dumpTimelineRecordtoJSON(entity)
* Arrays.toString(fullCommandArray)
* StringUtils.join(",", assignedResources)

Can you also check the checkstyle issues and clean up the line breaks string 
concats you are using?

> Replace isDebugEnabled with SLF4J parameterized log messages
> 
>
> Key: YARN-9343
> URL: https://issues.apache.org/jira/browse/YARN-9343
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9343-001.patch
>
>
> Replace isDebugEnabled with SLF4J parameterized log messages. 
> https://www.slf4j.org/faq.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9298) Implement FS placement rules using PlacementRule interface

2019-03-03 Thread Wilfred Spiegelenburg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16782932#comment-16782932
 ] 

Wilfred Spiegelenburg commented on YARN-9298:
-

# saw that, had them already fixed in a new version
# fixed that one and also made the {{QueueManager}} private and introduced a 
getter for it. It is only set in the class itself and needed outside when the 
parent rule is run (that fixes the 3rd checkstyle issue)
# It should have been true from the start, changed the init to true.

Removed two unneeded casts also in the {{setConfig}} method.

I think that is it [~yufeigu]

> Implement FS placement rules using PlacementRule interface
> --
>
> Key: YARN-9298
> URL: https://issues.apache.org/jira/browse/YARN-9298
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Attachments: YARN-9298.001.patch, YARN-9298.002.patch, 
> YARN-9298.003.patch, YARN-9298.004.patch, YARN-9298.005.patch, 
> YARN-9298.006.patch
>
>
> Implement existing placement rules of the FS using the PlacementRule 
> interface.
> Preparation for YARN-8967



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9298) Implement FS placement rules using PlacementRule interface

2019-03-03 Thread Wilfred Spiegelenburg (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YARN-9298:

Attachment: YARN-9298.006.patch

> Implement FS placement rules using PlacementRule interface
> --
>
> Key: YARN-9298
> URL: https://issues.apache.org/jira/browse/YARN-9298
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Attachments: YARN-9298.001.patch, YARN-9298.002.patch, 
> YARN-9298.003.patch, YARN-9298.004.patch, YARN-9298.005.patch, 
> YARN-9298.006.patch
>
>
> Implement existing placement rules of the FS using the PlacementRule 
> interface.
> Preparation for YARN-8967



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6487) FairScheduler: remove continuous scheduling (YARN-1010)

2019-03-03 Thread Wilfred Spiegelenburg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16782897#comment-16782897
 ] 

Wilfred Spiegelenburg commented on YARN-6487:
-

It works both ways:

Before we can schedule we need to update the current usage and shares etc. This 
runs in an update thread. Continuous scheduling triggers that update. The 
heartbeats when they are processed do the same. This updating requires a lock 
of the scheduler as does the scheduling process itself. The extra update demand 
is the trigger/
So you get into a state that the heartbeat, the updates and the scheduling 
itself are all waiting for the lock. The larger the number of nodes, the larger 
the number of applications is (in most cases) and the larger the number of 
queues (again in most cases). All this combined causes processing to start 
lagging and the continuous scheduling really loses its function.

Node numbers influence continuous scheduling and the other way around.

> FairScheduler: remove continuous scheduling (YARN-1010)
> ---
>
> Key: YARN-6487
> URL: https://issues.apache.org/jira/browse/YARN-6487
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: fairscheduler
>Affects Versions: 2.7.0
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
>
> Remove deprecated FairScheduler continuous scheduler code



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9326) Fair Scheduler configuration defaults are not documented in case of min and maxResources

2019-03-01 Thread Wilfred Spiegelenburg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16781720#comment-16781720
 ] 

Wilfred Spiegelenburg commented on YARN-9326:
-

Thank you for the update [~adam.antal]
 I see the same message when building with those options.

For remark 4: it should have been {{maxContainerAllocation}} as you said. 
Please update the text for the vcores.

For all the {{max}} settings if the new definition is used all resource types 
are given and or set. Old ones will only set memory and cores and leave 
unspecified ones set to 0. All including the unspecified ones are checked 
recursively up the queue tree. The *root* queue values are set via 
yarn.scheduler.maximum* and the resource type config.

 I might not have been completely clear in my comment #6. I am missing the fact 
that the {{maxResources}} limit is also enforced recursively. A queue will not 
be assigned a container if that assignment would put the queue or its parent(s) 
over the maximum resources. It is the same for maxima assigned via 
{{maxResources}} on static queues and {{maxChildResources}} for dynamic queues.

> Fair Scheduler configuration defaults are not documented in case of min and 
> maxResources
> 
>
> Key: YARN-9326
> URL: https://issues.apache.org/jira/browse/YARN-9326
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: docs, documentation, fairscheduler, yarn
>Affects Versions: 3.2.0
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
> Attachments: YARN-9326.001.patch, YARN-9326.002.patch, 
> YARN-9326.003.patch, YARN-9326.004.patch
>
>
> The FairScheduler's configuration has the following defaults (from the code: 
> javadoc):
> {noformat}
> In new style resources, any resource that is not specified will be set to 
> missing or 0%, as appropriate. Also, in the new style resources, units are 
> not allowed. Units are assumed from the resource manager's settings for the 
> resources when the value isn't a percentage. The missing parameter is only 
> used in the case of new style resources without percentages. With new style 
> resources with percentages, any missing resources will be assumed to be 100% 
> because percentages are only used with maximum resource limits.
> {noformat}
> This is not documented in the hadoop yarn site FairScheduler.html. It is 
> quite intuitive, but still need to be documented though.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9298) Implement FS placement rules using PlacementRule interface

2019-03-01 Thread Wilfred Spiegelenburg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16781661#comment-16781661
 ] 

Wilfred Spiegelenburg commented on YARN-9298:
-

Last version also fixes the {{createQueue}} flag and removes unchecked casts 
from the test code.
[~yufeigu] Please ignore patch 004 and check 005.

> Implement FS placement rules using PlacementRule interface
> --
>
> Key: YARN-9298
> URL: https://issues.apache.org/jira/browse/YARN-9298
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Attachments: YARN-9298.001.patch, YARN-9298.002.patch, 
> YARN-9298.003.patch, YARN-9298.004.patch, YARN-9298.005.patch
>
>
> Implement existing placement rules of the FS using the PlacementRule 
> interface.
> Preparation for YARN-8967



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9298) Implement FS placement rules using PlacementRule interface

2019-03-01 Thread Wilfred Spiegelenburg (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YARN-9298:

Attachment: YARN-9298.005.patch

> Implement FS placement rules using PlacementRule interface
> --
>
> Key: YARN-9298
> URL: https://issues.apache.org/jira/browse/YARN-9298
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Attachments: YARN-9298.001.patch, YARN-9298.002.patch, 
> YARN-9298.003.patch, YARN-9298.004.patch, YARN-9298.005.patch
>
>
> Implement existing placement rules of the FS using the PlacementRule 
> interface.
> Preparation for YARN-8967



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9298) Implement FS placement rules using PlacementRule interface

2019-03-01 Thread Wilfred Spiegelenburg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16781648#comment-16781648
 ] 

Wilfred Spiegelenburg commented on YARN-9298:
-

# I added the abstract {{FSPlacementRule}} and moved things into it. I do not 
want to add the {{FSPlacementRule}} into the {{PlacementFactory}}, because of 
that I want to keep a blank {{setConfig}} in that definition. I am not happy 
with the {{createQueue}} that is left there and am still trying to get to a fix 
for that without to much impact.
# That was a good catch, the build and my check of the build did not pick that 
one up.
# I can live with either solution, changed it to your preferred way.

> Implement FS placement rules using PlacementRule interface
> --
>
> Key: YARN-9298
> URL: https://issues.apache.org/jira/browse/YARN-9298
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Attachments: YARN-9298.001.patch, YARN-9298.002.patch, 
> YARN-9298.003.patch, YARN-9298.004.patch
>
>
> Implement existing placement rules of the FS using the PlacementRule 
> interface.
> Preparation for YARN-8967



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9298) Implement FS placement rules using PlacementRule interface

2019-03-01 Thread Wilfred Spiegelenburg (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YARN-9298:

Attachment: YARN-9298.004.patch

> Implement FS placement rules using PlacementRule interface
> --
>
> Key: YARN-9298
> URL: https://issues.apache.org/jira/browse/YARN-9298
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Attachments: YARN-9298.001.patch, YARN-9298.002.patch, 
> YARN-9298.003.patch, YARN-9298.004.patch
>
>
> Implement existing placement rules of the FS using the PlacementRule 
> interface.
> Preparation for YARN-8967



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6487) FairScheduler: remove continuous scheduling (YARN-1010)

2019-02-27 Thread Wilfred Spiegelenburg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16780108#comment-16780108
 ] 

Wilfred Spiegelenburg commented on YARN-6487:
-

The removal of continuous scheduling was/is based on performance numbers and 
locking issues.

Continuous scheduling was introduced to help speed up allocating containers in 
a small cluster that did not have a large number of heartbeats coming in. This 
would happen in clusters that were running a mixed load of containers with an 
emphasis on longer running containers. In those clusters the NM heartbeats 
would hold up assigning containers when a burst of requests would come in.

The side effect is however that when a cluster grows (100+ nodes) the number of 
heartbeats that needed processing started interfering with the continuous 
scheduling thread and other internal threads. This does cause thread starvation 
and in the worst case scheduling comes to a standstill.
The improvements that have been made in the scheduler that now allows you to 
assign multiple containers per heartbeat and still spread the load over 
multiple nodes have made continuous scheduling unneeded in all but the smallest 
clusters. In those clusters changing NM heartbeat intervals can be used to 
workaround that.
So we really do not need it anymore. If turned on in large clusters it can 
cause a lot of side effect that is why we decided to deprecate it.

We could think about completely decoupling scheduling from the NM heartbeat to 
remove the locking but that would be a far bigger task which affects all 
schedulers.

> FairScheduler: remove continuous scheduling (YARN-1010)
> ---
>
> Key: YARN-6487
> URL: https://issues.apache.org/jira/browse/YARN-6487
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: fairscheduler
>Affects Versions: 2.7.0
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
>
> Remove deprecated FairScheduler continuous scheduler code



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9278) Shuffle nodes when selecting to be preempted nodes

2019-02-27 Thread Wilfred Spiegelenburg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16780101#comment-16780101
 ] 

Wilfred Spiegelenburg commented on YARN-9278:
-

Two things:
* I still think limiting the number of nodes is something we need to approach 
with care.
* randomising a 10,000 entry long list each time we pre-empt will also become 
expensive.
 
I was thinking more of something like this:
{code:java}
  int preEmptionBatchSize = conf.getPreEmptionBatchSize();
  List potentialNodes = 
scheduler.getNodeTracker().getNodesByResourceName(rr.getResourceName());
  int size = potentialNodes.size();
  int stop = 0;
  int current = 0;
  // find a start point somewhere in the list if it is long
  if (size > preEmptionBatchSize) {
Random rand = new Random();
current = rand.nextInt(size / preEmptionBatchSize) * preEmptionBatchSize;
  }
  do {
FSSchedulerNode mine = potentialNodes.get(current);
// Identify the containers

current++;
// flip at the end of the list  
if (current > size) {
  current = 0;
}
  } while (current != stop);
{code}

Pre-emption runs in a loop and we could be considering different applications 
one after the other. Shuffling that node list continually is not good from a 
performance perspective. A simple cut in like above gives the same kind of 
behaviour. 
We could then still limit the number of "batches" we process. With some more 
smarts the stop condition could be based on the fact that we have processed as 
an example 10 * the batch size in nodes (a batch of nodes could be deemed 
equivalent with the number of nodes in a rack):
{code}  stop = ((10 * preEmptionBatchSize) > size) ? current : (((10 * 
preEmptionBatchSize) + current) % size););
{code}  

That gives a lot of flexibility and still a decent performance in a large 
cluster.

> Shuffle nodes when selecting to be preempted nodes
> --
>
> Key: YARN-9278
> URL: https://issues.apache.org/jira/browse/YARN-9278
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
>
> We should *shuffle* the nodes to avoid some nodes being preempted frequently. 
> Also, we should *limit* the num of nodes to make preemption more efficient.
> Just like this,
> {code:java}
> // we should not iterate all nodes, that will be very slow
> long maxTryNodeNum = 
> context.getPreemptionConfig().getToBePreemptedNodeMaxNumOnce();
> if (potentialNodes.size() > maxTryNodeNum){
>   Collections.shuffle(potentialNodes);
>   List newPotentialNodes = new ArrayList();
> for (int i = 0; i < maxTryNodeNum; i++){
>   newPotentialNodes.add(potentialNodes.get(i));
> }
> potentialNodes = newPotentialNodes;
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9298) Implement FS placement rules using PlacementRule interface

2019-02-27 Thread Wilfred Spiegelenburg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779358#comment-16779358
 ] 

Wilfred Spiegelenburg edited comment on YARN-9298 at 2/28/19 2:11 AM:
--

1) oops copy past error, fixed now
2) yep, you're right replaced the text
3) added
4) The tests we have in YARN-8967 are up a level: they test the rules as part 
of a list of rules and not really every rule independently. They do do not 
check the rule config/init parts. I have added new tests for all rules in the 
{{TestPlacementRuleFS}} class for config and init. I would like to leave the 
placement checks in the policy for clarity.
5) You cannot use a switch with an Object as the input as per the [java 
docs|https://docs.oracle.com/javase/tutorial/java/nutsandbolts/switch.html]. To 
do that we would need to switch on a string object compared to the Class name 
which I don't think is a good idea as it is discouraged due to false 
positives/negatives and class loader dependencies.
6) For the {{setConfig()}}:
* moving the Object check out will pollute the abstract class with 
FairScheduler dependencies and two extra {{setConfig()}} methods. Those 2 
methods will be _noop_ implementations in the abstract class. I think more 
confusing when you look at it from other schedulers.
* The only part that could possibly be pulled out is getting the create flag 
out that is done in this version of the patch.

6) I looked at {{initialize()}} but that is not really possible:
* Moving the scheduler check out is not possible, especially not into the 
abstract class.
* The check for the parent rule outside the class itself does not make it any 
cleaner. Two different cases are handled in the same code lines (not allowed 
and not the same class). Moving them makes it really messy.



was (Author: wilfreds):
1) oops copy past error, fixed now
2) yep, you're right replaced the text
3) added
4) The tests we have in YARN-8967 are up a level: they test the rules as part 
of a list of rules and not really every rule independently. They do do not 
check the rule config/init parts. I have added new tests for all rules in the 
{{TestPlacementRuleFS}} class for config and init. I would like to leave the 
placement checks in the policy for clarity.
5) You cannot use a switch with an Object as the input as per the [java 
docs|https://docs.oracle.com/javase/tutorial/java/nutsandbolts/switch.html]. To 
do that we would need to switch on a string object compared to the Class name 
which I don't think is a good idea as it is discouraged due to false 
positives/negatives and class loader dependencies.
6) For the {{setConfig()}}:
* moving the Object check out will pollute the abstract class with 
FairScheduler dependencies and two extra {{setConfig()}} methods. Those 2 
methods will be _noop_ implementations in the abstract class. I think more 
confusing when you look at it from other schedulers.
* The only part that could possibly be pulled out is getting the create flag 
out that is done in this version of the patch.
6) I looked at {{initialize()}} but that is not really possible:
* Moving the scheduler check out is not possible, especially not into the 
abstract class.
* The check for the parent rule outside the class itself does not make it any 
cleaner. Two different cases are handled in the same code lines (not allowed 
and not the same class). Moving them makes it really messy.


> Implement FS placement rules using PlacementRule interface
> --
>
> Key: YARN-9298
> URL: https://issues.apache.org/jira/browse/YARN-9298
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Attachments: YARN-9298.001.patch, YARN-9298.002.patch, 
> YARN-9298.003.patch
>
>
> Implement existing placement rules of the FS using the PlacementRule 
> interface.
> Preparation for YARN-8967



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9326) Fair Scheduler configuration defaults are not documented in case of min and maxResources

2019-02-27 Thread Wilfred Spiegelenburg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16780015#comment-16780015
 ] 

Wilfred Spiegelenburg commented on YARN-9326:
-

Thank you for the updated patch [~adam.antal]

Some further text comments:
* For minResources this text is no longer correct when you take into account 
resource types: {{For the single-resource fairness policy, the vcores value is 
ignored.}} It should mention that it only uses the memory setting, since we can 
have more resource types than just 2.
* This is not correct {{**maxResources**: maximum resources a queue will 
allocated.}} it should be something like _can be allocated_ not _will allocated_
* Same text here needs to be fixed: {{**maxChildResources**: maximum resources 
an ad hoc child queue will allocated.}}
* For the maxChildResources this is not correct: {{It's default value is 
**yarn.scheduler.maximum-allocation-mb**.}} as it ignores types and even the 
vcores:
** It should mention the vcore equivalent for the yarn config.
** The scheduler max allocation which is again a resource object and thus can 
set a limit on all resource types. (via the resource type config file)
* This sentence should not be in the maxChildresource: {{In the latter case the 
units will be inferred from the default units configured for that resource.}}
* A child queue limit is enforced recursively and thus will not be assigned a 
container if that assignment would put the child queue or its parent(s) over 
the maximum resources.
* In the last changed sentence: _or maximum_ change to _or to the maximum_

> Fair Scheduler configuration defaults are not documented in case of min and 
> maxResources
> 
>
> Key: YARN-9326
> URL: https://issues.apache.org/jira/browse/YARN-9326
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: docs, documentation, fairscheduler, yarn
>Affects Versions: 3.2.0
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
> Attachments: YARN-9326.001.patch, YARN-9326.002.patch, 
> YARN-9326.003.patch
>
>
> The FairScheduler's configuration has the following defaults (from the code: 
> javadoc):
> {noformat}
> In new style resources, any resource that is not specified will be set to 
> missing or 0%, as appropriate. Also, in the new style resources, units are 
> not allowed. Units are assumed from the resource manager's settings for the 
> resources when the value isn't a percentage. The missing parameter is only 
> used in the case of new style resources without percentages. With new style 
> resources with percentages, any missing resources will be assumed to be 100% 
> because percentages are only used with maximum resource limits.
> {noformat}
> This is not documented in the hadoop yarn site FairScheduler.html. It is 
> quite intuitive, but still need to be documented though.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8967) Change FairScheduler to use PlacementRule interface

2019-02-27 Thread Wilfred Spiegelenburg (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YARN-8967:

Attachment: YARN-8967.006.patch

> Change FairScheduler to use PlacementRule interface
> ---
>
> Key: YARN-8967
> URL: https://issues.apache.org/jira/browse/YARN-8967
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler, fairscheduler
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Attachments: YARN-8967.001.patch, YARN-8967.002.patch, 
> YARN-8967.003.patch, YARN-8967.004.patch, YARN-8967.005.patch, 
> YARN-8967.006.patch
>
>
> The PlacementRule interface was introduced to be used by all schedulers as 
> per YARN-3635. The CapacityScheduler is using it but the FairScheduler is not 
> and is using its own rule definition.
> YARN-8948 cleans up the implementation and removes the CS references which 
> should allow this change to go through.
> This would be the first step in using one placement rule engine for both 
> schedulers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9298) Implement FS placement rules using PlacementRule interface

2019-02-27 Thread Wilfred Spiegelenburg (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YARN-9298:

Attachment: YARN-9298.003.patch

> Implement FS placement rules using PlacementRule interface
> --
>
> Key: YARN-9298
> URL: https://issues.apache.org/jira/browse/YARN-9298
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Attachments: YARN-9298.001.patch, YARN-9298.002.patch, 
> YARN-9298.003.patch
>
>
> Implement existing placement rules of the FS using the PlacementRule 
> interface.
> Preparation for YARN-8967



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9298) Implement FS placement rules using PlacementRule interface

2019-02-27 Thread Wilfred Spiegelenburg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779358#comment-16779358
 ] 

Wilfred Spiegelenburg commented on YARN-9298:
-

1) oops copy past error, fixed now
2) yep, you're right replaced the text
3) added
4) The tests we have in YARN-8967 are up a level: they test the rules as part 
of a list of rules and not really every rule independently. They do do not 
check the rule config/init parts. I have added new tests for all rules in the 
{{TestPlacementRuleFS}} class for config and init. I would like to leave the 
placement checks in the policy for clarity.
5) You cannot use a switch with an Object as the input as per the [java 
docs|https://docs.oracle.com/javase/tutorial/java/nutsandbolts/switch.html]. To 
do that we would need to switch on a string object compared to the Class name 
which I don't think is a good idea as it is discouraged due to false 
positives/negatives and class loader dependencies.
6) For the {{setConfig()}}:
* moving the Object check out will pollute the abstract class with 
FairScheduler dependencies and two extra {{setConfig()}} methods. Those 2 
methods will be _noop_ implementations in the abstract class. I think more 
confusing when you look at it from other schedulers.
* The only part that could possibly be pulled out is getting the create flag 
out that is done in this version of the patch.
6) I looked at {{initialize()}} but that is not really possible:
* Moving the scheduler check out is not possible, especially not into the 
abstract class.
* The check for the parent rule outside the class itself does not make it any 
cleaner. Two different cases are handled in the same code lines (not allowed 
and not the same class). Moving them makes it really messy.


> Implement FS placement rules using PlacementRule interface
> --
>
> Key: YARN-9298
> URL: https://issues.apache.org/jira/browse/YARN-9298
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Attachments: YARN-9298.001.patch, YARN-9298.002.patch
>
>
> Implement existing placement rules of the FS using the PlacementRule 
> interface.
> Preparation for YARN-8967



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9326) Fair Scheduler configuration defaults are not documented in case of min and maxResources

2019-02-27 Thread Wilfred Spiegelenburg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779296#comment-16779296
 ] 

Wilfred Spiegelenburg commented on YARN-9326:
-

Thanks [~adam.antal] for following up on YARN-8662

* I looked at the documentation and am missing the changes to the 
{{minResources}}. As I stated in YARN-8662 the {{minResources}} tag also 
handles % signs in its definition. That is not mentioned in any style example. 
We need to add it for new and old style definitions.
* New style resources for all settings can use % which is not shown. I am thus 
still missing the example for new style resources that use the percentage in 
all settings:
{code}
vcores=X%, memory-mb=Y%
{code}
* The other thing that is still not clear in the update is that the only case 
in which we default the resource types not specified to either 0 or the maximum 
is when we use the new style resources. That should be combined with making it 
even clearer that resource types should *not* be used in combination with old 
style definitions.

> Fair Scheduler configuration defaults are not documented in case of min and 
> maxResources
> 
>
> Key: YARN-9326
> URL: https://issues.apache.org/jira/browse/YARN-9326
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: docs, documentation, fairscheduler, yarn
>Affects Versions: 3.2.0
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
> Attachments: YARN-9326.001.patch, YARN-9326.002.patch
>
>
> The FairScheduler's configuration has the following defaults (from the code: 
> javadoc):
> {noformat}
> In new style resources, any resource that is not specified will be set to 
> missing or 0%, as appropriate. Also, in the new style resources, units are 
> not allowed. Units are assumed from the resource manager's settings for the 
> resources when the value isn't a percentage. The missing parameter is only 
> used in the case of new style resources without percentages. With new style 
> resources with percentages, any missing resources will be assumed to be 100% 
> because percentages are only used with maximum resource limits.
> {noformat}
> This is not documented in the hadoop yarn site FairScheduler.html. It is 
> quite intuitive, but still need to be documented though.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9278) Shuffle nodes when selecting to be preempted nodes

2019-02-24 Thread Wilfred Spiegelenburg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16776502#comment-16776502
 ] 

Wilfred Spiegelenburg commented on YARN-9278:
-

[~uranus] I can understand that you want to limit the number of nodes to look 
at for pre-emption in large clusters. I could speed things up in certain cases. 
However when I look at the way we identify we already break out of the loop 
when we get to a node that gives back a container list without AMs. In 
{{identifyContainersToPreemptForOneContainer}} we break out of the loop 
checking nodes when {{numAMContainers}} was 0. So we do already break out of 
the loop looking for suitable nodes.

Based on your comment this will change will introduce a trade of between AMs 
and nodes. You propose to stop checking nodes even if we still have AMs in the 
list. In other words you are willing to accept some AMs in the list even if 
that has side effects on those applications. I don't think that that is a good 
idea.

I do agree with you that for the ANY resource we probably want to do something 
else and not just grab the first nodes out of the list all the time. The list 
that comes back from the node tracker is unsorted and just a copy of what is 
known without a filter. We should introduce some logic to not just use a for 
loop to run over the list from the start. If we use a seeded start point 
somewhere in the list which moves around we spread our 
preemption better.
We could base the starting point on the current time (second) and the size of 
the list returned. I don't think we need that if the list is smaller than a 
hard coded number (maybe 50 or 100)  but it would really help in large clusters.


> Shuffle nodes when selecting to be preempted nodes
> --
>
> Key: YARN-9278
> URL: https://issues.apache.org/jira/browse/YARN-9278
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
>
> We should *shuffle* the nodes to avoid some nodes being preempted frequently. 
> Also, we should *limit* the num of nodes to make preemption more efficient.
> Just like this,
> {code:java}
> // we should not iterate all nodes, that will be very slow
> long maxTryNodeNum = 
> context.getPreemptionConfig().getToBePreemptedNodeMaxNumOnce();
> if (potentialNodes.size() > maxTryNodeNum){
>   Collections.shuffle(potentialNodes);
>   List newPotentialNodes = new ArrayList();
> for (int i = 0; i < maxTryNodeNum; i++){
>   newPotentialNodes.add(potentialNodes.get(i));
> }
> potentialNodes = newPotentialNodes;
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9323) FSLeafQueue#computeMaxAMResource does not override zero values for custom resources

2019-02-21 Thread Wilfred Spiegelenburg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16774779#comment-16774779
 ] 

Wilfred Spiegelenburg commented on YARN-9323:
-

Hi [~snemeth] Some comments on this change as it includes a number of changes 
that are not related to fixing the issue. 

These changes just increase the size of the fix:
* The {{FairScheduler}} change seems to be just a layout change
* in the FSLeafQueue we have similar changes around {{setMemorySize}} and 
{{setVirtualCores}}
* {{computeMaxAMResource}} javadoc changes are unneeded
* import re-ordering in the TestFSLeafQueue is unneeded

These two should be fixed:
* checkstyle issue: _MAX_AM_SHARE_ in {{TestFSLeafQueue}} should be final
* whitespace issue: line 219 of the patch

The rest should wait until we have a test run with YARN-9322 committed


> FSLeafQueue#computeMaxAMResource does not override zero values for custom 
> resources
> ---
>
> Key: YARN-9323
> URL: https://issues.apache.org/jira/browse/YARN-9323
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-9323.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8662) Fair Scheduler stops scheduling when a queue is configured only CPU and memory

2019-02-21 Thread Wilfred Spiegelenburg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16774708#comment-16774708
 ] 

Wilfred Spiegelenburg commented on YARN-8662:
-

Hi [~Sen Zhao], Thank you for filing this and providing a patch.
I had some time and finally got around to looking at this for a review. Are you 
still willing to work on this?

It looks like this issue only happen if you use old style resource definitions 
for the __ entries. 
The java doc for {{parseResourceConfigValue}} states:
{code}
   * The {@code missing} parameter is only used in the case of new style
   * resources without percentages. With new style resources with percentages,
   * any missing resources will be assumed to be 100% because percentages are
   * only used with maximum resource limits.
{code}
Which means that the code is doing what it is documented. You are using old 
style resources definitions. Your change is going to break this as it will now 
use the missing parameter also for old style resource definition without 
percentages.
The workaround would be to use the new style declaration and the maximum would 
be set according to what you would expect. Old style declarations are there for 
backwards compatibility. When using resource types you really should be using 
the new style definitions.

If we still want to go down this path and make old style behave more like the 
new style then we have a number of other changes that need to be made:
* make a change similar to what you have now
* clean up the java doc
* clean up user documentation as minimum can take a percentage which is not 
documented at all
* fix the percentage for old style: we need to handle min resources too as now 
the min for any custom type is 100% of the cluster.

If we do not go through we should at least fix the two documentation points and 
document that you should use the new style definitions for min and max when you 
use resource types.

> Fair Scheduler stops scheduling when a queue is configured only CPU and memory
> --
>
> Key: YARN-8662
> URL: https://issues.apache.org/jira/browse/YARN-8662
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Sen Zhao
>Assignee: Sen Zhao
>Priority: Major
> Attachments: NonResourceToSchedule.png, YARN-8662.001.patch
>
>
> Add a new resource type in resource-types.xml, eg: resource1. 
> In Fair scheduler when queue's MaxResources is configured like: 
> {code}4096 mb, 4 vcores{code}
> When submit a application which need resource like:
> {code} 1536 mb, 1 vcores, 10 resource1{code}
> The application will be pending. Because there is no resource1 in this queue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9298) Implement FS placement rules using PlacementRule interface

2019-02-20 Thread Wilfred Spiegelenburg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16773708#comment-16773708
 ] 

Wilfred Spiegelenburg commented on YARN-9298:
-

Thank you for the review [~yufeigu] it took a bit longer than expected working 
on 4 and 5 without polluting the code too much.
1) done added to all files changed
2) added tests for:
* FairQueuePlacementUtils
* PlacementFactory
* PlacementRule (FS added parts)
3) removed the extra line
4) That is how I started the implementation. I ran into a number of problems 
while instantiating the rules in the policy and then moved to this model. I 
have it working now without polluting the factory and or rule with lots of FS 
specific classes.
5) Done that as part of the rewrite for 4)
6) updated the javadoc for the method
7) fixed
8) removed, the exception is already logged higher up in the stack


> Implement FS placement rules using PlacementRule interface
> --
>
> Key: YARN-9298
> URL: https://issues.apache.org/jira/browse/YARN-9298
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Attachments: YARN-9298.001.patch, YARN-9298.002.patch
>
>
> Implement existing placement rules of the FS using the PlacementRule 
> interface.
> Preparation for YARN-8967



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9298) Implement FS placement rules using PlacementRule interface

2019-02-20 Thread Wilfred Spiegelenburg (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YARN-9298:

Attachment: YARN-9298.002.patch

> Implement FS placement rules using PlacementRule interface
> --
>
> Key: YARN-9298
> URL: https://issues.apache.org/jira/browse/YARN-9298
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Attachments: YARN-9298.001.patch, YARN-9298.002.patch
>
>
> Implement existing placement rules of the FS using the PlacementRule 
> interface.
> Preparation for YARN-8967



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9298) Implement FS placement rules using PlacementRule interface

2019-02-14 Thread Wilfred Spiegelenburg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16768900#comment-16768900
 ] 

Wilfred Spiegelenburg commented on YARN-9298:
-

[~cheersyang] Can you please check this?

> Implement FS placement rules using PlacementRule interface
> --
>
> Key: YARN-9298
> URL: https://issues.apache.org/jira/browse/YARN-9298
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Attachments: YARN-9298.001.patch
>
>
> Implement existing placement rules of the FS using the PlacementRule 
> interface.
> Preparation for YARN-8967



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9308) fairscheduler-statedump.log gets generated regardless of service again after the merge of HDFS-7240

2019-02-14 Thread Wilfred Spiegelenburg (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YARN-9308:

Attachment: YARN-9308.001.patch

> fairscheduler-statedump.log gets generated regardless of service again after 
> the merge of HDFS-7240
> ---
>
> Key: YARN-9308
> URL: https://issues.apache.org/jira/browse/YARN-9308
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler, scheduler
>Affects Versions: 3.2.0
>Reporter: Akira Ajisaka
>Assignee: Wilfred Spiegelenburg
>Priority: Blocker
> Attachments: YARN-9308.001.patch
>
>
> After the merge of HDFS-7240, YARN-6453 occurred again.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9308) fairscheduler-statedump.log gets generated regardless of service again after the merge of HDFS-7240

2019-02-14 Thread Wilfred Spiegelenburg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16768891#comment-16768891
 ] 

Wilfred Spiegelenburg commented on YARN-9308:
-

The changes from [HDFS-7240 git commit 
fixup|https://github.com/apache/hadoop/commit/2adda92de1535c0472c0df33a145fa1814703f4f]
 added the log config lines back without the comment marks
 I will upload a patch to fix it up again.

> fairscheduler-statedump.log gets generated regardless of service again after 
> the merge of HDFS-7240
> ---
>
> Key: YARN-9308
> URL: https://issues.apache.org/jira/browse/YARN-9308
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler, scheduler
>Affects Versions: 3.2.0
>Reporter: Akira Ajisaka
>Assignee: Wilfred Spiegelenburg
>Priority: Blocker
>
> After the merge of HDFS-7240, YARN-6453 occurred again.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-9308) fairscheduler-statedump.log gets generated regardless of service again after the merge of HDFS-7240

2019-02-14 Thread Wilfred Spiegelenburg (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg reassigned YARN-9308:
---

Assignee: Wilfred Spiegelenburg

> fairscheduler-statedump.log gets generated regardless of service again after 
> the merge of HDFS-7240
> ---
>
> Key: YARN-9308
> URL: https://issues.apache.org/jira/browse/YARN-9308
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler, scheduler
>Affects Versions: 3.2.0
>Reporter: Akira Ajisaka
>Assignee: Wilfred Spiegelenburg
>Priority: Blocker
>
> After the merge of HDFS-7240, YARN-6453 occurred again.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-1655) Add implementations to FairScheduler to support increase/decrease container resource

2019-02-13 Thread Wilfred Spiegelenburg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16767164#comment-16767164
 ] 

Wilfred Spiegelenburg commented on YARN-1655:
-

The junit test failures are not related to this change.

[~asuresh] could you please review this as you did the unifying code work?

> Add implementations to FairScheduler to support increase/decrease container 
> resource
> 
>
> Key: YARN-1655
> URL: https://issues.apache.org/jira/browse/YARN-1655
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, scheduler
>Reporter: Wangda Tan
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Attachments: YARN-1655.001.patch, YARN-1655.002.patch, 
> YARN-1655.003.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9298) Implement FS placement rules using PlacementRule interface

2019-02-13 Thread Wilfred Spiegelenburg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16767167#comment-16767167
 ] 

Wilfred Spiegelenburg commented on YARN-9298:
-

Junit test failure seems unrelated
no tests is correct those will follow with the integration into the scheduler/

> Implement FS placement rules using PlacementRule interface
> --
>
> Key: YARN-9298
> URL: https://issues.apache.org/jira/browse/YARN-9298
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Attachments: YARN-9298.001.patch
>
>
> Implement existing placement rules of the FS using the PlacementRule 
> interface.
> Preparation for YARN-8967



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8967) Change FairScheduler to use PlacementRule interface

2019-02-12 Thread Wilfred Spiegelenburg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16766813#comment-16766813
 ] 

Wilfred Spiegelenburg commented on YARN-8967:
-

After talking off line with a number of people the request was to divide this 
change into two parts due to its size:
* _part 1_ for the new rules and changes to the existing PlacementRule code
* _part 2_ for the FS changes and integration

It is the only way that the change can be split and make them compile 
separately. A new jira YARN-9298 is open for _part 1_ and we'll keep this jira 
for _part 2_. Removing patch available until that one is checked in.

It will also allow work to start on enhancing the rules with filters etc which 
have existing open jiras.

> Change FairScheduler to use PlacementRule interface
> ---
>
> Key: YARN-8967
> URL: https://issues.apache.org/jira/browse/YARN-8967
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler, fairscheduler
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Attachments: YARN-8967.001.patch, YARN-8967.002.patch, 
> YARN-8967.003.patch, YARN-8967.004.patch, YARN-8967.005.patch
>
>
> The PlacementRule interface was introduced to be used by all schedulers as 
> per YARN-3635. The CapacityScheduler is using it but the FairScheduler is not 
> and is using its own rule definition.
> YARN-8948 cleans up the implementation and removes the CS references which 
> should allow this change to go through.
> This would be the first step in using one placement rule engine for both 
> schedulers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9277) Add more restrictions In FairScheduler Preemption

2019-02-12 Thread Wilfred Spiegelenburg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16766811#comment-16766811
 ] 

Wilfred Spiegelenburg commented on YARN-9277:
-

I agree with [~Steven Rand] sorting could be good but setting a hard no go 
could cause issues.

Can you also explain how we can pre-empt a container that is owned by the 
application itself? 
I thought that we would only allow containers to be pre-empted if the 
application is over its fair share and even then only if pre-empting the 
container would not drop the application below its fair share. The 
{{FSPreemptionThread.identifyContainersToPreemptOnNode()}} calls 
{{app.canContainerBePreempted()}} which contains that check and the container 
is not added. Since the app we are pre-empting for is under its fair share any 
container of the app itself should be filtered out by that. Am I reading this 
all wrong or have you found cases that we did pre-empt a container for its own 
app and it is not working as expected?

> Add more restrictions In FairScheduler Preemption 
> --
>
> Key: YARN-9277
> URL: https://issues.apache.org/jira/browse/YARN-9277
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: YARN-9277.001.patch, YARN-9277.002.patch
>
>
>  
> I think we should add more restrictions in fair scheduler preemption. 
>  * We should not preempt self
>  * We should not preempt high priority job
>  * We should not preempt container which has been running for a long time.
>  * ...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9298) Implement FS placement rules using PlacementRule interface

2019-02-12 Thread Wilfred Spiegelenburg (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YARN-9298:

Attachment: YARN-9298.001.patch

> Implement FS placement rules using PlacementRule interface
> --
>
> Key: YARN-9298
> URL: https://issues.apache.org/jira/browse/YARN-9298
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Attachments: YARN-9298.001.patch
>
>
> Implement existing placement rules of the FS using the PlacementRule 
> interface.
> Preparation for YARN-8967



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9298) Implement FS placement rules using PlacementRule interface

2019-02-12 Thread Wilfred Spiegelenburg (JIRA)
Wilfred Spiegelenburg created YARN-9298:
---

 Summary: Implement FS placement rules using PlacementRule interface
 Key: YARN-9298
 URL: https://issues.apache.org/jira/browse/YARN-9298
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Reporter: Wilfred Spiegelenburg
Assignee: Wilfred Spiegelenburg


Implement existing placement rules of the FS using the PlacementRule interface.

Preparation for YARN-8967



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-1655) Add implementations to FairScheduler to support increase/decrease container resource

2019-02-12 Thread Wilfred Spiegelenburg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16766771#comment-16766771
 ] 

Wilfred Spiegelenburg commented on YARN-1655:
-

Updated test to make it more robust. locally ran all new tests 250 times have 
not seen a failure.

> Add implementations to FairScheduler to support increase/decrease container 
> resource
> 
>
> Key: YARN-1655
> URL: https://issues.apache.org/jira/browse/YARN-1655
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, scheduler
>Reporter: Wangda Tan
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Attachments: YARN-1655.001.patch, YARN-1655.002.patch, 
> YARN-1655.003.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-1655) Add implementations to FairScheduler to support increase/decrease container resource

2019-02-12 Thread Wilfred Spiegelenburg (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YARN-1655:

Attachment: YARN-1655.003.patch

> Add implementations to FairScheduler to support increase/decrease container 
> resource
> 
>
> Key: YARN-1655
> URL: https://issues.apache.org/jira/browse/YARN-1655
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, scheduler
>Reporter: Wangda Tan
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Attachments: YARN-1655.001.patch, YARN-1655.002.patch, 
> YARN-1655.003.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8655) FairScheduler: FSStarvedApps is not thread safe

2019-02-12 Thread Wilfred Spiegelenburg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16765949#comment-16765949
 ] 

Wilfred Spiegelenburg commented on YARN-8655:
-

Hi [~uranus], I am not saying that what we do now is 100% correct. I am only 
doubting how often this occurs and what the impact on the application and 
scheduling activities is. Based on the analysis I did I think we need a 
solution for this case that has far less impact. Do we know any of the 
following:
How badly does it affect the running applications, do we pre-empt double what 
we should? 
Does not handling this correctly slow down pre-emption? 
Is there another impact of not handling the edge case?

Pre-emption currently runs almost continually and is gated by the {{take()}}: 
when there is a pre-emption waiting we handle it. The patch changes this into 
one pre-emption per second. It effectively throttles down the pre-emption from 
processing applications based on their arrival to slow scheduled trickle.
When I look at how we calculate and decide if the application is marked as 
minimum share starved the cases should be limited. Even if the application is 
fair share starved and the queue is min share starved we do not automatically 
mark the application as min share starved. We thus only have this edge case for 
a small number of applications.
Fixing that edge case by slowing down all pre-emption handling is what I think 
is not right.


> FairScheduler: FSStarvedApps is not thread safe
> ---
>
> Key: YARN-8655
> URL: https://issues.apache.org/jira/browse/YARN-8655
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.0.0
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: YARN-8655.002.patch, YARN-8655.patch
>
>
> *FSStarvedApps is not thread safe, this may make one starve app is processed 
> for two times continuously.*
> For example, when app1 is *fair share starved*, it has been added to 
> appsToProcess. After that, app1 is taken but appBeingProcessed is not yet 
> update to app1. At the moment, app1 is *starved by min share*, so this app 
> is added to appsToProcess again! Because appBeingProcessed is null and 
> appsToProcess also have not this one. 
> {code:java}
> void addStarvedApp(FSAppAttempt app) {
> if (!app.equals(appBeingProcessed) && !appsToProcess.contains(app)) {
> appsToProcess.add(app);
> }
> }
> FSAppAttempt take() throws InterruptedException {
>   // Reset appBeingProcessed before the blocking call
>   appBeingProcessed = null;
>   // Blocking call to fetch the next starved application
>   FSAppAttempt app = appsToProcess.take();
>   appBeingProcessed = app;
>   return app;
> }
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8655) FairScheduler: FSStarvedApps is not thread safe

2019-02-11 Thread Wilfred Spiegelenburg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16765733#comment-16765733
 ] 

Wilfred Spiegelenburg commented on YARN-8655:
-

Looking at how we get to adding an application to the starved list I don't 
think this is a thread safety issue.

I do agree that we could process the application twice. Fair share starvation 
and min share starvation are two different things. The queue is starved for min 
share and the application is starved for fair share. This does not mean that it 
is a problem. If the application is starved for fair share the calculation of 
the queue min share starvation already takes that fact into account.

The {{updateStarvedAppsMinshare()}} deducts any fair share starvation already 
processed for applications from the possible min share starvation. This means 
two things for an application that is marked for min share starvation
# the application fair share starvation is less than the distributed min share 
starvation of the queue 
# the application has an outstanding demand that is higher than its fair share 
starvation

The chance that an application is starved for fair share with a demand that is 
higher than its fair share starvation combined with the distributed queue 
minimum share that is higher than the fair share starvation is small.

It could be worth the fix if it has a high impact. Looking at the way you are 
proposing to fix it in the patch is however not the way. You introduce a 
{{Thread.sleep()}} call in the pre-emption thread which is not correct. 
Currently the pre-emption will happen when a starved app is added and no 
pre-emption is in progress. With the change there is only 1 pre-emption per 
second. This is a high impact change and I think we need to come up with a 
smarter way to handle this case with less of an impact on the pre-emption 
itself.


> FairScheduler: FSStarvedApps is not thread safe
> ---
>
> Key: YARN-8655
> URL: https://issues.apache.org/jira/browse/YARN-8655
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.0.0
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: YARN-8655.002.patch, YARN-8655.patch
>
>
> *FSStarvedApps is not thread safe, this may make one starve app is processed 
> for two times continuously.*
> For example, when app1 is *fair share starved*, it has been added to 
> appsToProcess. After that, app1 is taken but appBeingProcessed is not yet 
> update to app1. At the moment, app1 is *starved by min share*, so this app 
> is added to appsToProcess again! Because appBeingProcessed is null and 
> appsToProcess also have not this one. 
> {code:java}
> void addStarvedApp(FSAppAttempt app) {
> if (!app.equals(appBeingProcessed) && !appsToProcess.contains(app)) {
> appsToProcess.add(app);
> }
> }
> FSAppAttempt take() throws InterruptedException {
>   // Reset appBeingProcessed before the blocking call
>   appBeingProcessed = null;
>   // Blocking call to fetch the next starved application
>   FSAppAttempt app = appsToProcess.take();
>   appBeingProcessed = app;
>   return app;
> }
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-1655) Add implementations to FairScheduler to support increase/decrease container resource

2019-02-11 Thread Wilfred Spiegelenburg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16765663#comment-16765663
 ] 

Wilfred Spiegelenburg commented on YARN-1655:
-

testDecreaseAfterIncreaseWithAllocationExpiration is logged as YARN-5684
testContainersFromPreviousAttemptsWithRMRestart is logged as YARN-8433

patch updated to fix the checkstyle issues that could be fixed, left the 
RMContainerImpl as it as they change lines up with the current indents

> Add implementations to FairScheduler to support increase/decrease container 
> resource
> 
>
> Key: YARN-1655
> URL: https://issues.apache.org/jira/browse/YARN-1655
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, scheduler
>Reporter: Wangda Tan
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Attachments: YARN-1655.001.patch, YARN-1655.002.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-1655) Add implementations to FairScheduler to support increase/decrease container resource

2019-02-11 Thread Wilfred Spiegelenburg (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YARN-1655:

Attachment: YARN-1655.002.patch

> Add implementations to FairScheduler to support increase/decrease container 
> resource
> 
>
> Key: YARN-1655
> URL: https://issues.apache.org/jira/browse/YARN-1655
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, scheduler
>Reporter: Wangda Tan
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Attachments: YARN-1655.001.patch, YARN-1655.002.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-1655) Add implementations to FairScheduler to support increase/decrease container resource

2019-02-11 Thread Wilfred Spiegelenburg (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YARN-1655:

Attachment: YARN-1655.001.patch

> Add implementations to FairScheduler to support increase/decrease container 
> resource
> 
>
> Key: YARN-1655
> URL: https://issues.apache.org/jira/browse/YARN-1655
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, scheduler
>Reporter: Wangda Tan
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Attachments: YARN-1655.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-1655) Add implementations to FairScheduler to support increase/decrease container resource

2019-02-11 Thread Wilfred Spiegelenburg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16764807#comment-16764807
 ] 

Wilfred Spiegelenburg edited comment on YARN-1655 at 2/11/19 9:50 AM:
--

Adding resizing to the FS.

Some background around the changes outside the FS:
# the {{RMContainerImpl}} logs a message when the temporary containers for 
resizing are released because they are in the wrong state. The new transitions 
clean those up
# Normalising requests has been moved from the {{CapacityScheduler}} into the 
{{AbstractYarnScheduler}} as it is used by both schedulers.
# Resizing would only use the ANY request and leave node and rack requests 
hanging around which caused the FS to allocate strange containers. 
{{AppSchedulingInfo}} now allows for cleaning up the unneeded requests from the 
{{ContainerUpdateContext}}


was (Author: wilfreds):
Adding resizing to the FS.

Some background around the changes outside the FS:
# the {{RMContainerImpl}} logs a message when the temporary containers for 
resizing are released because they are in the wrong state. The new transitions 
clean those up
# Normalising requests has been moved from the {{CapacityScheduler}} into the 
{{AbstractYarnScheduler} as it is used by both schedulers.
# Resizing would only use the ANY request and leave node and rack requests 
hanging around which caused the FS to allocate strange containers. 
{{AppSchedulingInfo}} now allows for cleaning up the unneeded requests from the 
{{ContainerUpdateContext}}

> Add implementations to FairScheduler to support increase/decrease container 
> resource
> 
>
> Key: YARN-1655
> URL: https://issues.apache.org/jira/browse/YARN-1655
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, scheduler
>Reporter: Wangda Tan
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Attachments: YARN-1655.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8967) Change FairScheduler to use PlacementRule interface

2019-02-03 Thread Wilfred Spiegelenburg (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YARN-8967:

Attachment: YARN-8967.005.patch

> Change FairScheduler to use PlacementRule interface
> ---
>
> Key: YARN-8967
> URL: https://issues.apache.org/jira/browse/YARN-8967
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler, fairscheduler
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Attachments: YARN-8967.001.patch, YARN-8967.002.patch, 
> YARN-8967.003.patch, YARN-8967.004.patch, YARN-8967.005.patch
>
>
> The PlacementRule interface was introduced to be used by all schedulers as 
> per YARN-3635. The CapacityScheduler is using it but the FairScheduler is not 
> and is using its own rule definition.
> YARN-8948 cleans up the implementation and removes the CS references which 
> should allow this change to go through.
> This would be the first step in using one placement rule engine for both 
> schedulers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8967) Change FairScheduler to use PlacementRule interface

2019-02-03 Thread Wilfred Spiegelenburg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16759579#comment-16759579
 ] 

Wilfred Spiegelenburg commented on YARN-8967:
-

Rebased to trunk, the mockito changes prevented it from being applied.
Diff is basically the same just a 2 line difference in one patch chunk for 
imports:  [^YARN-8967.005.patch] 

> Change FairScheduler to use PlacementRule interface
> ---
>
> Key: YARN-8967
> URL: https://issues.apache.org/jira/browse/YARN-8967
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler, fairscheduler
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Attachments: YARN-8967.001.patch, YARN-8967.002.patch, 
> YARN-8967.003.patch, YARN-8967.004.patch, YARN-8967.005.patch
>
>
> The PlacementRule interface was introduced to be used by all schedulers as 
> per YARN-3635. The CapacityScheduler is using it but the FairScheduler is not 
> and is using its own rule definition.
> YARN-8948 cleans up the implementation and removes the CS references which 
> should allow this change to go through.
> This would be the first step in using one placement rule engine for both 
> schedulers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9262) TestRMAppAttemptTransitions is failing with an NPE

2019-02-01 Thread Wilfred Spiegelenburg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16758145#comment-16758145
 ] 

Wilfred Spiegelenburg commented on YARN-9262:
-

The failure occurs because no {{Allocation}} object comes back as the {{when}} 
mock call does not fit the arguments. When the AM gets allocated  we have 
{{null}} values in 3 places and the {{when}} should look like this:
{code}
when(scheduler.allocate(any(ApplicationAttemptId.class), any(List.class),
any(), any(List.class), any(), any(),
any(ContainerUpdates.class))).
{code}

> TestRMAppAttemptTransitions is failing with an NPE
> --
>
> Key: YARN-9262
> URL: https://issues.apache.org/jira/browse/YARN-9262
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.2.0, 3.1.2, 3.3.0
>Reporter: Sunil Govindan
>Assignee: lujie
>Priority: Critical
>
> hadoop.yarn.server.resourcemanager.rmapp.attempt.TestRMAppAttemptTransitions 
> fails due to an NPE post YARN-9194
> {code}
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:1202)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:1182)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:915)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:121)
> {code}
> cc [~xiaoheipangzi] 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-1655) Add implementations to FairScheduler to support increase/decrease container resource

2019-02-01 Thread Wilfred Spiegelenburg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16758120#comment-16758120
 ] 

Wilfred Spiegelenburg commented on YARN-1655:
-

I have started working on this already. I have a working code change for trunk 
based on the changes from YARN-6216. YARN-6216 by itself is not enough to 
implement the resizing, we do need some FS changes.

The only thing that is still bothering me is a new junit test I wrote that 
keeps failing. The failure is caused by lingering resource requests. There does 
not seem to be a proper clean up of resource requests in all cases. This only 
seems to happen when an increase results in a reservation. If we then later 
cancel or the increase request that caused the reservation it leaves the _node_ 
and _rack_ request behind and just removes the _any_ part of the request. This 
looks similar to what is mentioned in YARN-5540 around leaving requests behind 
which should not be there.

This issue does affect both schedulers but does not seem to cause a junit 
failure in the capacity scheduler.

> Add implementations to FairScheduler to support increase/decrease container 
> resource
> 
>
> Key: YARN-1655
> URL: https://issues.apache.org/jira/browse/YARN-1655
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, scheduler
>Reporter: Wangda Tan
>Assignee: Wilfred Spiegelenburg
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-1655) Add implementations to FairScheduler to support increase/decrease container resource

2019-02-01 Thread Wilfred Spiegelenburg (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg reassigned YARN-1655:
---

Assignee: Wilfred Spiegelenburg  (was: Sandy Ryza)

> Add implementations to FairScheduler to support increase/decrease container 
> resource
> 
>
> Key: YARN-1655
> URL: https://issues.apache.org/jira/browse/YARN-1655
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, scheduler
>Reporter: Wangda Tan
>Assignee: Wilfred Spiegelenburg
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



<    1   2   3   4   5   6   >