Kendall Thrapp created YARN-462:
-----------------------------------

             Summary: Project Parameter for Chargeback
                 Key: YARN-462
                 URL: https://issues.apache.org/jira/browse/YARN-462
             Project: Hadoop YARN
          Issue Type: Improvement
          Components: resourcemanager
    Affects Versions: 0.23.6
            Reporter: Kendall Thrapp


Problem Summary

For the purpose of chargeback and better understanding of grid usage, we need 
to be able to associate applications with "projects", e.g. "pipeline X", 
"property Y".  This would allow us to aggregate on this property, thereby 
helping us compute grid resource usage for the entire "project".  Currently, 
for a given application, two things we know about it are the user that 
submitted it and the queue it was submitted to.  Below, I'll explain why 
neither of these is adequate for enterprise-level chargeback and understanding 
resource allocation needs.

Why Not Users?

Its not individual users that are paying the bill -- its projects.  When one of 
our real users submits an application on a Hadoop grid, they're presumably not 
usually doing it for themselves.  They're doing work for some project or team 
effort, so its that team or project that should be "charged" for all its users 
applications.  Maintaining outside lists of associations between users and 
projects is error-prone because it is time-sensitive and requires continued 
ongoing maintenance.  New users join organizations, users leave and users even 
change projects.  Furthermore, users may split their time between multiple 
projects, making it ambiguous as to which of a user's projects a given 
application should be charged.  Also, there can be headless users, which can be 
even more difficult to link to a project and can be shared between teams or 
projects.

Why Not Queues?

The purpose of queues is for scheduling.  Overloading the queues concept to 
also mean who should be "charged" for an application can have a detrimental 
effect on the primary purpose of queues.  It could be manageable in the case of 
a very small number of projects sharing a cluster, but doesn't scale to tens or 
hundreds of projects sharing a cluster.  If a given cluster is shared between 
50 projects, creating 50 separate queues will result in inefficient use of the 
cluster resources.  Furthermore, a given project may desire more than one queue 
for different types or priorities of applications.  

Proposed Solution

Rather than relying on external tools to infer through the user and/or queue 
who to "charge" for a given application, I propose a straightforward approach 
where that information be explicitly supplied when the application is 
submitted, just like we do with queues.  Let's use a charge card analogy: when 
you buy something online, you don't just say who you are and how to ship it, 
you also specify how you're paying for it.  Similarly, when submitting an 
application in YARN, you could explicitly specify to whom it's resource usage 
should be associated (a project, team, cost center, etc).

This new configuration parameter should default to being optional, so that 
organizations not interested in chargeback or project-level resource tracking 
can happily continue on as if it wasn't there.  However, it should be 
configurable at the cluster-level such that, a given cluster to could elect to 
make it required, so that all applications would have an associated project.  
The value of this new parameter should be exposed via the Resource Manager UI 
and Resource Manager REST API, so that users and tools can make use of it for 
chargeback, utilization metrics, etc.

I'm undecided on what to name the new parameter, as I like the flexibility in 
the ways it could be used.  It is essentially just an additional party other 
than user or queue that an application can be associated with, so its use is 
not just limited to a chargeback scenario.  For example, an organization not 
interested in chargeback could still use this parameter to communicate useful 
information about a application (e.g. pipelineX.stageN) and aggregate like 
applications.

Enforcement

Couldn't users just specify this information as a prefix for their job names?  
Yes, but the missing piece this could provides is enforcement.  Ideally, I'd 
like this parameter to work very much like how the queues work.  Like already 
exists with queues, it'd be ideal if a given user couldn't just specify any old 
value for this parameter.  It could be configurable such that a given user only 
has permission to submit applications for specific "projects".  Submitting an 
application with this parameter being anything other than what the given user 
is allowed, would cause the application to be rejected in the same manner as if 
the user has specified an invalid queue.

Again, so as to have no effect on organizations not interested in this feature, 
this enforcement should be off by default, but configurable at the cluster 
level such that it could be turned on for clusters wanting to use it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to