[jira] [Commented] (YARN-2848) (FICA) Applications should maintain an application specific 'cluster' resource to calculate headroom and userlimit

2015-05-01 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14523613#comment-14523613
 ] 

Wangda Tan commented on YARN-2848:
--

[~cwelch], this problem doesn't existed after you added 
{{CapacityHeadroomProvider}}, right? My understanding is application-specific 
resources needs to calculate headroom and userlimit can be added to 
{{CapacityHeadroomProvider}}.

 (FICA) Applications should maintain an application specific 'cluster' 
 resource to calculate headroom and userlimit
 --

 Key: YARN-2848
 URL: https://issues.apache.org/jira/browse/YARN-2848
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Reporter: Craig Welch
Assignee: Craig Welch

 Likely solutions to [YARN-1680] (properly handling node and rack blacklisting 
 with cluster level node additions and removals) will entail managing an 
 application-level slice of the cluster resource available to the 
 application for use in accurately calculating the application headroom and 
 user limit.  There is an assumption that events which impact this resource 
 will occur less frequently than the need to calculate headroom, userlimit, 
 etc (which is a valid assumption given that occurs per-allocation heartbeat). 
  Given that, the application should (with assistance from cluster-level 
 code...) detect changes to the composition of the cluster (node addition, 
 removal) and when those have occurred, calculate an application specific 
 cluster resource by comparing cluster nodes to it's own blacklist (both rack 
 and individual node).  I think it makes sense to include nodelabel 
 considerations into this calculation as it will be efficient to do both at 
 the same time and the single resource value reflecting both constraints could 
 then be used for efficient frequent headroom and userlimit calculations while 
 remaining highly accurate.  The application would need to be made aware of 
 nodelabel changes it is interested in (the application or removal of labels 
 of interest to the application to/from nodes).  For this purpose, the 
 application submissions's nodelabel expression would be used to determine the 
 nodelabel impact on the resource used to calculate userlimit and headroom 
 (Cases where the application elected to request resources not using the 
 application level label expression are out of scope for this - but for the 
 common usecase of an application which uses a particular expression 
 throughout, userlimit and headroom would be accurate) This could also provide 
 an overall mechanism for handling application-specific resource constraints 
 which might be added in the future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2848) (FICA) Applications should maintain an application specific 'cluster' resource to calculate headroom and userlimit

2015-05-01 Thread Craig Welch (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14523629#comment-14523629
 ] 

Craig Welch commented on YARN-2848:
---

The ResourceUsage functionality added in [YARN-3356] [YARN-3099] and 
[YARN-3092] is effectively an implementation of the approach suggested here,  
was also used for [YARN-3463].  Given that, I'm going to close this one.  While 
it's not yet been used to address the blacklist issue with headroom 
[YARN-1680], that should be handled there in any case.

 (FICA) Applications should maintain an application specific 'cluster' 
 resource to calculate headroom and userlimit
 --

 Key: YARN-2848
 URL: https://issues.apache.org/jira/browse/YARN-2848
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Reporter: Craig Welch
Assignee: Craig Welch

 Likely solutions to [YARN-1680] (properly handling node and rack blacklisting 
 with cluster level node additions and removals) will entail managing an 
 application-level slice of the cluster resource available to the 
 application for use in accurately calculating the application headroom and 
 user limit.  There is an assumption that events which impact this resource 
 will occur less frequently than the need to calculate headroom, userlimit, 
 etc (which is a valid assumption given that occurs per-allocation heartbeat). 
  Given that, the application should (with assistance from cluster-level 
 code...) detect changes to the composition of the cluster (node addition, 
 removal) and when those have occurred, calculate an application specific 
 cluster resource by comparing cluster nodes to it's own blacklist (both rack 
 and individual node).  I think it makes sense to include nodelabel 
 considerations into this calculation as it will be efficient to do both at 
 the same time and the single resource value reflecting both constraints could 
 then be used for efficient frequent headroom and userlimit calculations while 
 remaining highly accurate.  The application would need to be made aware of 
 nodelabel changes it is interested in (the application or removal of labels 
 of interest to the application to/from nodes).  For this purpose, the 
 application submissions's nodelabel expression would be used to determine the 
 nodelabel impact on the resource used to calculate userlimit and headroom 
 (Cases where the application elected to request resources not using the 
 application level label expression are out of scope for this - but for the 
 common usecase of an application which uses a particular expression 
 throughout, userlimit and headroom would be accurate) This could also provide 
 an overall mechanism for handling application-specific resource constraints 
 which might be added in the future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2848) (FICA) Applications should maintain an application specific 'cluster' resource to calculate headroom and userlimit

2015-01-06 Thread Chen He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14266590#comment-14266590
 ] 

Chen He commented on YARN-2848:
---

I guess the label is provide by users or applications to choose what nodes to 
run. The Blacklist is detected by system that what nodes are not stable to run. 
The blacklisted nodes could be regarded as a special label or NOT label. 
However, we need extra synchronization process to keep the consistency of 
users/apps requests and unstable nodes before making scheduling decision. 
YARN-1680 could be a solution before we actually settle down the label scope 
and the synchronization overhead issue. 

 (FICA) Applications should maintain an application specific 'cluster' 
 resource to calculate headroom and userlimit
 --

 Key: YARN-2848
 URL: https://issues.apache.org/jira/browse/YARN-2848
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Reporter: Craig Welch
Assignee: Craig Welch

 Likely solutions to [YARN-1680] (properly handling node and rack blacklisting 
 with cluster level node additions and removals) will entail managing an 
 application-level slice of the cluster resource available to the 
 application for use in accurately calculating the application headroom and 
 user limit.  There is an assumption that events which impact this resource 
 will occur less frequently than the need to calculate headroom, userlimit, 
 etc (which is a valid assumption given that occurs per-allocation heartbeat). 
  Given that, the application should (with assistance from cluster-level 
 code...) detect changes to the composition of the cluster (node addition, 
 removal) and when those have occurred, calculate an application specific 
 cluster resource by comparing cluster nodes to it's own blacklist (both rack 
 and individual node).  I think it makes sense to include nodelabel 
 considerations into this calculation as it will be efficient to do both at 
 the same time and the single resource value reflecting both constraints could 
 then be used for efficient frequent headroom and userlimit calculations while 
 remaining highly accurate.  The application would need to be made aware of 
 nodelabel changes it is interested in (the application or removal of labels 
 of interest to the application to/from nodes).  For this purpose, the 
 application submissions's nodelabel expression would be used to determine the 
 nodelabel impact on the resource used to calculate userlimit and headroom 
 (Cases where the application elected to request resources not using the 
 application level label expression are out of scope for this - but for the 
 common usecase of an application which uses a particular expression 
 throughout, userlimit and headroom would be accurate) This could also provide 
 an overall mechanism for handling application-specific resource constraints 
 which might be added in the future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2848) (FICA) Applications should maintain an application specific 'cluster' resource to calculate headroom and userlimit

2014-11-12 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1420#comment-1420
 ] 

Wangda Tan commented on YARN-2848:
--

[~cwelch],
IIUC, this JIRA is to tackle the cases which app has some special requirements 
on resource requests (including but not limited to black list nodes, node 
labels expression, etc.) and RM want to return headroom considering such 
factors to AM.
My major concern of this is it will bring more computation complexity in RM 
side -- we already have very heavy computation when trying to allocate 
containers, like locality/hierachy-of-queues/user-limit/headroom/node-labels, 
if we trying to resolve the problem by handling events (such as node label 
change, black node list change, etc.) at *app level*, it will be very 
problematic, since some of the operations cannot be even done in O\(n\) time.
So I think if some operation have complex of O\(n\), (n can be as large as #app 
in the cluster), we should be very discreet to such operation.

Any thoughts?

 (FICA) Applications should maintain an application specific 'cluster' 
 resource to calculate headroom and userlimit
 --

 Key: YARN-2848
 URL: https://issues.apache.org/jira/browse/YARN-2848
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Reporter: Craig Welch
Assignee: Craig Welch

 Likely solutions to [YARN-1680] (properly handling node and rack blacklisting 
 with cluster level node additions and removals) will entail managing an 
 application-level slice of the cluster resource available to the 
 application for use in accurately calculating the application headroom and 
 user limit.  There is an assumption that events which impact this resource 
 will occur less frequently than the need to calculate headroom, userlimit, 
 etc (which is a valid assumption given that occurs per-allocation heartbeat). 
  Given that, the application should (with assistance from cluster-level 
 code...) detect changes to the composition of the cluster (node addition, 
 removal) and when those have occurred, calculate an application specific 
 cluster resource by comparing cluster nodes to it's own blacklist (both rack 
 and individual node).  I think it makes sense to include nodelabel 
 considerations into this calculation as it will be efficient to do both at 
 the same time and the single resource value reflecting both constraints could 
 then be used for efficient frequent headroom and userlimit calculations while 
 remaining highly accurate.  The application would need to be made aware of 
 nodelabel changes it is interested in (the application or removal of labels 
 of interest to the application to/from nodes).  For this purpose, the 
 application submissions's nodelabel expression would be used to determine the 
 nodelabel impact on the resource used to calculate userlimit and headroom 
 (Cases where the application elected to request resources not using the 
 application level label expression are out of scope for this - but for the 
 common usecase of an application which uses a particular expression 
 throughout, userlimit and headroom would be accurate) This could also provide 
 an overall mechanism for handling application-specific resource constraints 
 which might be added in the future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2848) (FICA) Applications should maintain an application specific 'cluster' resource to calculate headroom and userlimit

2014-11-12 Thread Craig Welch (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14208941#comment-14208941
 ] 

Craig Welch commented on YARN-2848:
---

bq. IIUC, this JIRA is to tackle the cases which app has some special 
requirements on resource requests (including but not limited to black list 
nodes, node labels expression, etc.) and RM want to return headroom considering 
such factors to AM.

Well, yes... although the extent to which they are special isn't clear, 
[YARN-1680] surfaces this as a bug (something of a design miss...) for 
blacklisting of resources which has been around for some time - and of course, 
node labels were recently added but with an eye to being used - as in, there's 
a desire to be able to use them with processes which will want to have accurate 
headroom, userlimit, etc - so the problem already exists, as it were, it's not 
something new we're choosing to introduce, it's rather a way of resolving 
inconsistencies which exist because of functionalities which is are perhaps not 
fully complete wrt the rest of the system - and in so far as we want 
applications to work with constraints with respect to nodes they use, we will 
need to solve this problem in some way, or do away with headroom and / or user 
limits as such, which is not a very attractive choice

bq. My major concern of this is it will bring more computation complexity in RM 
side – we already have very heavy computation when trying to allocate 
containers, like locality/hierachy-of-queues/user-limit/headroom/node-labels

The idea is to minimize the calculation needed during allocation by making 
adjustments to resources only as needed by external events which should be 
relatively infrequent with respect to any given application

bq.  if we trying to resolve the problem by handling events (such as node label 
change, black node list change, etc.) at app level, it will be very 
problematic, since some of the operations cannot be even done in O( n ) time.
bq. So I think if some operation have complex of O( n ), (n can be as large as 
#app in the cluster), we should be very discreet to such operation.

so, the suggestion is not to have the activity which accepts a node label 
change or a node addition or removal from a cluster synchronously notify all 
applications of that change - rather, to allow applications to check for 
changes relevant to them (changes to the nodes held by a label they care about 
(label level info), node additions or removals relevant to their blacklisting 
(cluster level info)) and to have the application only adjust it's resource 
view when it determines it is necessary to do so - at the level of the cluster 
handling the addition or removal of a node, or changes to the nodes for a node 
label, nothing more than an indication of last change for the resources needs 
to occur, and applications will simply check for change indications that they 
care about and take action as needed - it should be as efficient and 
lightweight as possible, and would not impose any O ( n ) (where n=#app in 
cluster) operations on any single/synchronous code path


 (FICA) Applications should maintain an application specific 'cluster' 
 resource to calculate headroom and userlimit
 --

 Key: YARN-2848
 URL: https://issues.apache.org/jira/browse/YARN-2848
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Reporter: Craig Welch
Assignee: Craig Welch

 Likely solutions to [YARN-1680] (properly handling node and rack blacklisting 
 with cluster level node additions and removals) will entail managing an 
 application-level slice of the cluster resource available to the 
 application for use in accurately calculating the application headroom and 
 user limit.  There is an assumption that events which impact this resource 
 will occur less frequently than the need to calculate headroom, userlimit, 
 etc (which is a valid assumption given that occurs per-allocation heartbeat). 
  Given that, the application should (with assistance from cluster-level 
 code...) detect changes to the composition of the cluster (node addition, 
 removal) and when those have occurred, calculate an application specific 
 cluster resource by comparing cluster nodes to it's own blacklist (both rack 
 and individual node).  I think it makes sense to include nodelabel 
 considerations into this calculation as it will be efficient to do both at 
 the same time and the single resource value reflecting both constraints could 
 then be used for efficient frequent headroom and userlimit calculations while 
 remaining highly accurate.  The application would need to be made aware of 
 nodelabel changes it is interested in (the application or 

[jira] [Commented] (YARN-2848) (FICA) Applications should maintain an application specific 'cluster' resource to calculate headroom and userlimit

2014-11-12 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14209021#comment-14209021
 ] 

Wangda Tan commented on YARN-2848:
--

[~cwelch], Thanks for your explanation, I think it is valid to have such 
mechanism of course :), I just concerned about the cost.

The pull model you mentioned is isomorphic as the push model (send events to 
apps, which we can also add filters to select which apps to send). And wrt pull 
model, we don't have dedicated thread for app to do that. And more problematic, 
if we cannot get apps synchronously handle such events, we need prepare a event 
queue for apps to do that.

And I think the statement is not always true
bq. and would not impose any O ( n ) (where n=#app in cluster) operations on 
any single/synchronous code path
Since it is possible we change labels on a set of nodes (say 1k nodes), and 
many applications could run across the 1k nodes, some operation will scan nodes 
and build information from scratch, it is a O ( n * m ) operation in very 
extreme cases.

 (FICA) Applications should maintain an application specific 'cluster' 
 resource to calculate headroom and userlimit
 --

 Key: YARN-2848
 URL: https://issues.apache.org/jira/browse/YARN-2848
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Reporter: Craig Welch
Assignee: Craig Welch

 Likely solutions to [YARN-1680] (properly handling node and rack blacklisting 
 with cluster level node additions and removals) will entail managing an 
 application-level slice of the cluster resource available to the 
 application for use in accurately calculating the application headroom and 
 user limit.  There is an assumption that events which impact this resource 
 will occur less frequently than the need to calculate headroom, userlimit, 
 etc (which is a valid assumption given that occurs per-allocation heartbeat). 
  Given that, the application should (with assistance from cluster-level 
 code...) detect changes to the composition of the cluster (node addition, 
 removal) and when those have occurred, calculate an application specific 
 cluster resource by comparing cluster nodes to it's own blacklist (both rack 
 and individual node).  I think it makes sense to include nodelabel 
 considerations into this calculation as it will be efficient to do both at 
 the same time and the single resource value reflecting both constraints could 
 then be used for efficient frequent headroom and userlimit calculations while 
 remaining highly accurate.  The application would need to be made aware of 
 nodelabel changes it is interested in (the application or removal of labels 
 of interest to the application to/from nodes).  For this purpose, the 
 application submissions's nodelabel expression would be used to determine the 
 nodelabel impact on the resource used to calculate userlimit and headroom 
 (Cases where the application elected to request resources not using the 
 application level label expression are out of scope for this - but for the 
 common usecase of an application which uses a particular expression 
 throughout, userlimit and headroom would be accurate) This could also provide 
 an overall mechanism for handling application-specific resource constraints 
 which might be added in the future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2848) (FICA) Applications should maintain an application specific 'cluster' resource to calculate headroom and userlimit

2014-11-12 Thread Craig Welch (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14209214#comment-14209214
 ] 

Craig Welch commented on YARN-2848:
---

bq. Thanks for your explanation, I think it is valid to have such mechanism of 
course , I just concerned about the cost.

It sounds like you're under the impression that this is somehow 
optional/elective - I don't believe it is.  Until we implement something along 
these lines we have known defects ( [YARN-1680], 
[https://issues.apache.org/jira/browse/YARN-2496?focusedCommentId=14143993page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14143993],
 
[https://issues.apache.org/jira/browse/YARN-796?focusedCommentId=14146321page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14146321]
 ), one way or another, some capability like this needs to be created, or we 
need to remove other functionality (headroom, userlimits), or continue to have 
significant defects/shortcomings (which is problematic, and imho not really an 
option)

bq. The pull model you mentioned is isomorphic as the push model (send events 
to apps, which we can also add filters to select which apps to send). And wrt 
pull model, we don't have dedicated thread for app to do that. And more 
problematic, if we cannot get apps synchronously handle such events, we need 
prepare a event queue for apps to do that.

not at all - as I've mentioned a couple of times, an option is simply to attach 
an update indicator to resources which can be compared by the app against it's 
own to determine if any action needs to be taken, with the general case 
expected to be, none.  That's where the efficiency of the approach comes in.  
Of course, the particulars of the implementation are what we need to work out 
here, but we do not necessarily have to have event queues, and we certainly 
don't need to have the apps synchronously handle events.  It's possible to take 
those approaches, but certainly not necessary.

bq. And I think the statement is not always true ... Since it is possible we 
change labels on a set of nodes (say 1k nodes), and many applications could run 
across the 1k nodes, some operation will scan nodes and build information from 
scratch, it is a O ( n * m ) operation in very extreme cases.

if all running applications were interested in a label which changed across all 
nodes in a cluster some activity would be necessary for them to make 
adjustments.  As a rule, this will be very infrequent in comparison to the 
frequency of allocation requests in the cluster, which is the strength of the 
approach.  Depending on how exactly we model things, it may well not be 
necessary for all applications to process all nodes of the cluster 
individually.  For example, if we limit nodes to a single label per node then 
that could be calculated at a cluster level.  If not, tracking intersection 
values for label combinations (if limited) could eliminate the need.  

Putting aside possible shortcuts for a moment, however, I suspect the 
straightforward approach of recalculation only when necessary at an application 
level will actually be fine - it's possible to posit pathological cases which 
will be problematic there, but it's possible to do that with many things.  If 
the pathological case (a change to labels of interest or nodes to every 
application at every allocation heartbeat, or a change to the set of cluster 
nodes on every heartbeat...) is not likely and does not need to be supported 
(it isn't and doesn't...), then infrequent recalculations only when necessary 
should not be problematic.  The original approach on [YARN-1680] would have 
performed that calculation with every allocation request - which we rightly 
took issue with - but doing so only when needed is considered to be a viable 
approach (the only realistic one I'm aware of...), which is why we're heading 
in that direction - the question is how to do that in detail.  The point of 
this jira is to note that the blacklist problem and the node label problem in 
relation to resources available to the application are strikingly similar to 
their needs (they're photo-negatives of one another, effectively...), and so it 
makes sense to combine them as it is likely that sharing would build both 
runtime and code efficiency.


 (FICA) Applications should maintain an application specific 'cluster' 
 resource to calculate headroom and userlimit
 --

 Key: YARN-2848
 URL: https://issues.apache.org/jira/browse/YARN-2848
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Reporter: Craig Welch
Assignee: Craig Welch

 Likely solutions to [YARN-1680] (properly handling node and rack