[ https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14549505#comment-14549505 ]
Chris Douglas commented on YARN-1039: ------------------------------------- The semantics of a boolean flag are opaque. The policies enforced by different RM configurations (and versions) will not be- and cannot be made to be- consistent. Application and container priority are already encoded (or in progress, YARN-1963), so it's not just preemption priority or cost. Affinity and anti-affinity are also covered by different features. Discussion has been wide-ranging because it is unclear what "long-lived" guarantees across existing features (beyond removing the progress bar from the UI, which I hope we can stop mentioning). An implementation that only recognizes infinite and undefined leases could be mapped into duration. Lease duration could also be used to communicate when security tokens cannot be renewed, short-lived guarantees for YARN-2877 containers, boundaries of YARN-1051 reservations, and planned decommissioning. In contrast, the "long-lived" flag cannot be used for these cases. We could expose probabilistic guarantees (which are what we give in reality), but that's a later issue. Considering the blockers more concretely: bq. (a) reservations (b) white-listed requests or (c) node-label requests getting stuck on a node used by other services' containers that don't exit. Aren't these handled by adding a timeout to allocations, which would also catch cases where this flag is _not_ set? The timeout value could be set across the scheduler to start, but could even be user-visible in later versions... All said, I don't have time to work on this, agree the API can be evolved from the flag, and am -0 on it. > Add parameter for YARN resource requests to indicate "long lived" > ----------------------------------------------------------------- > > Key: YARN-1039 > URL: https://issues.apache.org/jira/browse/YARN-1039 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager > Affects Versions: 3.0.0, 2.1.1-beta > Reporter: Steve Loughran > Assignee: Craig Welch > Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch > > > A container request could support a new parameter "long-lived". This could be > used by a scheduler that would know not to host the service on a transient > (cloud: spot priced) node. > Schedulers could also decide whether or not to allocate multiple long-lived > containers on the same node -- This message was sent by Atlassian JIRA (v6.3.4#6332)