[ https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14297783#comment-14297783 ]
Craig Welch commented on YARN-1039: ----------------------------------- [~chris.douglas] bq. YARN shouldn't understand the lifecycle for a service or the progress/dependencies for task containers That's not necessarily so, there are some cases where the type of life cycle for an application is important, for example, when determining whether or not it is open-ended ("service") or a batch process which entails a notion of progress ("session"), at least for purposes of display. I think we need to re scope and clarify this jira a bit so that we can make progress - there are a number of items in the original problem statement and subsequent comments which have been taken on elsewhere and so really no longer make sense to pursue here. Here's an attempt at a breakdown: bq. This could be used by a scheduler that would know not to host the service on a transient (cloud: spot priced) node I think this is now clearly covered by [YARN-796], nodes having qualities (including operational qualities such as these) is one of the core purposes of this work, it makes no sense to duplicate it here, and so it should be de-scoped from this jira bq. Schedulers could also decide whether or not to allocate multiple long-lived containers on the same node As [~ste...@apache.org] mentioned in an earlier comment [https://issues.apache.org/jira/browse/YARN-1039?focusedCommentId=14038041&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14038041] affinity / anti-affinity is covered in a more general sense in [YARN-1042]. The above component of this jira is really just such a case, and so it should be covered with that general solution and dropped from scope as well. There may be some interest in informing that solution based on a generalized "service" setting, but to really understand that the affinity approach needs to be worked out - and I think the affinity approach will really need to inform/integrate with this rather than the other way around, and integration should be approached as part of that effort That leaves nothing, so we can close the jira ;-) Not quite, there were several things added in comments: Token management - handled in [YARN-941] Scheduler hints not related to node categories or anti-affinity (opportunistic scheduling, etc) - this does strike me as something better handled via the duration route et all [YARN-2877] [YARN-1051] and not something which needs to be replicated here I think that really just leaves the progress bar (and potentially other display related items). This is covered by [YARN-1079] I suggest, then, that we either rescope this jira to providing the lifecycle information as an application tag [https://issues.apache.org/jira/browse/YARN-1039?focusedCommentId=14039679&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14039679] as suggested by [~zjshen] early on or close it and cover the work as part of [YARN-1079]. I originally objected to that approach on the basis that tags appeared to be a display type feature which did not fit this effort, but if re scoped as I'm proposing, it becomes such a feature, and I think that approach is now a good fit. Thoughts? > Add parameter for YARN resource requests to indicate "long lived" > ----------------------------------------------------------------- > > Key: YARN-1039 > URL: https://issues.apache.org/jira/browse/YARN-1039 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager > Affects Versions: 3.0.0, 2.1.1-beta > Reporter: Steve Loughran > Assignee: Craig Welch > Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch > > > A container request could support a new parameter "long-lived". This could be > used by a scheduler that would know not to host the service on a transient > (cloud: spot priced) node. > Schedulers could also decide whether or not to allocate multiple long-lived > containers on the same node -- This message was sent by Atlassian JIRA (v6.3.4#6332)