[jira] [Commented] (YARN-3143) RM Apps REST API can return NPE or entries missing id and other fields
[ https://issues.apache.org/jira/browse/YARN-3143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14310136#comment-14310136 ] Kendall Thrapp commented on YARN-3143: -- Thanks [~jlowe] for debugging and the super quick patch and thanks [~eepayne] and [~kihwal] for reviewing. > RM Apps REST API can return NPE or entries missing id and other fields > -- > > Key: YARN-3143 > URL: https://issues.apache.org/jira/browse/YARN-3143 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp >Affects Versions: 2.5.2 >Reporter: Kendall Thrapp >Assignee: Jason Lowe > Fix For: 2.7.0 > > Attachments: YARN-3143.001.patch > > > I'm seeing intermittent null pointer exceptions being returned by > the YARN Apps REST API. > For example: > {code} > http://{cluster}:{port}/ws/v1/cluster/apps?finalStatus=UNDEFINED > {code} > JSON Response was: > {code} > {"RemoteException":{"exception":"NullPointerException","javaClassName":"java.lang.NullPointerException"}} > {code} > At a glance appears to be only when we query for unfinished apps (i.e. > finalStatus=UNDEFINED). > Possibly related, when I do get back a list of apps, sometimes one or more of > the apps will be missing most of the fields, like id, name, user, etc., and > the fields that are present all have zero for the value. > For example: > {code} > {"progress":0.0,"clusterId":0,"applicationTags":"","startedTime":0,"finishedTime":0,"elapsedTime":0,"allocatedMB":0,"allocatedVCores":0,"runningContainers":0,"preemptedResourceMB":0,"preemptedResourceVCores":0,"numNonAMContainerPreempted":0,"numAMContainerPreempted":0} > {code} > Let me know if there's any other information I can provide to help debug. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3143) RM Apps REST API can return NPE or entries missing id and other fields
[ https://issues.apache.org/jira/browse/YARN-3143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kendall Thrapp updated YARN-3143: - Description: I'm seeing intermittent null pointer exceptions being returned by the YARN Apps REST API. For example: {code} http://{cluster}:{port}/ws/v1/cluster/apps?finalStatus=UNDEFINED {code} JSON Response was: {code} {"RemoteException":{"exception":"NullPointerException","javaClassName":"java.lang.NullPointerException"}} {code} At a glance appears to be only when we query for unfinished apps (i.e. finalStatus=UNDEFINED). Possibly related, when I do get back a list of apps, sometimes one or more of the apps will be missing most of the fields, like id, name, user, etc., and the fields that are present all have zero for the value. For example: {code} {"progress":0.0,"clusterId":0,"applicationTags":"","startedTime":0,"finishedTime":0,"elapsedTime":0,"allocatedMB":0,"allocatedVCores":0,"runningContainers":0,"preemptedResourceMB":0,"preemptedResourceVCores":0,"numNonAMContainerPreempted":0,"numAMContainerPreempted":0} {code} Let me know if there's any other information I can provide to help debug. was: I'm seeing intermittent null pointer exceptions being returned by the YARN Apps REST API. For example: http://{cluster}:{port}/ws/v1/cluster/apps?finalStatus=UNDEFINED JSON Response was: {"RemoteException":{"exception":"NullPointerException","javaClassName":"java.lang.NullPointerException"}} At a glance appears to be only when we query for unfinished apps (i.e. finalStatus=UNDEFINED). Possibly related, when I do get back a list of apps, sometimes one or more of the apps will be missing most of the fields, like id, name, user, etc., and the fields that are present all have zero for the value. For example: {"progress":0.0,"clusterId":0,"applicationTags":"","startedTime":0,"finishedTime":0,"elapsedTime":0,"allocatedMB":0,"allocatedVCores":0,"runningContainers":0,"preemptedResourceMB":0,"preemptedResourceVCores":0,"numNonAMContainerPreempted":0,"numAMContainerPreempted":0} Let me know if there's any other information I can provide to help debug. > RM Apps REST API can return NPE or entries missing id and other fields > -- > > Key: YARN-3143 > URL: https://issues.apache.org/jira/browse/YARN-3143 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp >Affects Versions: 2.5.2 >Reporter: Kendall Thrapp > > I'm seeing intermittent null pointer exceptions being returned by > the YARN Apps REST API. > For example: > {code} > http://{cluster}:{port}/ws/v1/cluster/apps?finalStatus=UNDEFINED > {code} > JSON Response was: > {code} > {"RemoteException":{"exception":"NullPointerException","javaClassName":"java.lang.NullPointerException"}} > {code} > At a glance appears to be only when we query for unfinished apps (i.e. > finalStatus=UNDEFINED). > Possibly related, when I do get back a list of apps, sometimes one or more of > the apps will be missing most of the fields, like id, name, user, etc., and > the fields that are present all have zero for the value. > For example: > {code} > {"progress":0.0,"clusterId":0,"applicationTags":"","startedTime":0,"finishedTime":0,"elapsedTime":0,"allocatedMB":0,"allocatedVCores":0,"runningContainers":0,"preemptedResourceMB":0,"preemptedResourceVCores":0,"numNonAMContainerPreempted":0,"numAMContainerPreempted":0} > {code} > Let me know if there's any other information I can provide to help debug. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3143) RM Apps REST API can return NPE or entries missing id and other fields
Kendall Thrapp created YARN-3143: Summary: RM Apps REST API can return NPE or entries missing id and other fields Key: YARN-3143 URL: https://issues.apache.org/jira/browse/YARN-3143 Project: Hadoop YARN Issue Type: Bug Components: webapp Affects Versions: 2.5.2 Reporter: Kendall Thrapp I'm seeing intermittent null pointer exceptions being returned by the YARN Apps REST API. For example: http://{cluster}:{port}/ws/v1/cluster/apps?finalStatus=UNDEFINED JSON Response was: {"RemoteException":{"exception":"NullPointerException","javaClassName":"java.lang.NullPointerException"}} At a glance appears to be only when we query for unfinished apps (i.e. finalStatus=UNDEFINED). Possibly related, when I do get back a list of apps, sometimes one or more of the apps will be missing most of the fields, like id, name, user, etc., and the fields that are present all have zero for the value. For example: {"progress":0.0,"clusterId":0,"applicationTags":"","startedTime":0,"finishedTime":0,"elapsedTime":0,"allocatedMB":0,"allocatedVCores":0,"runningContainers":0,"preemptedResourceMB":0,"preemptedResourceVCores":0,"numNonAMContainerPreempted":0,"numAMContainerPreempted":0} Let me know if there's any other information I can provide to help debug. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-415) Capture aggregate memory allocation at the app-level for chargeback
[ https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14139336#comment-14139336 ] Kendall Thrapp commented on YARN-415: - Thanks [~eepayne], [~aklochkov], [~jianhe], [~wangda], [~kasha], [~sandyr] and [~jlowe] for all your effort on this! Looking forward to being able to use this feature. > Capture aggregate memory allocation at the app-level for chargeback > --- > > Key: YARN-415 > URL: https://issues.apache.org/jira/browse/YARN-415 > Project: Hadoop YARN > Issue Type: New Feature > Components: resourcemanager >Affects Versions: 2.5.0 >Reporter: Kendall Thrapp >Assignee: Eric Payne > Fix For: 2.6.0 > > Attachments: YARN-415--n10.patch, YARN-415--n2.patch, > YARN-415--n3.patch, YARN-415--n4.patch, YARN-415--n5.patch, > YARN-415--n6.patch, YARN-415--n7.patch, YARN-415--n8.patch, > YARN-415--n9.patch, YARN-415.201405311749.txt, YARN-415.201406031616.txt, > YARN-415.201406262136.txt, YARN-415.201407042037.txt, > YARN-415.201407071542.txt, YARN-415.201407171553.txt, > YARN-415.201407172144.txt, YARN-415.201407232237.txt, > YARN-415.201407242148.txt, YARN-415.201407281816.txt, > YARN-415.201408062232.txt, YARN-415.201408080204.txt, > YARN-415.201408092006.txt, YARN-415.201408132109.txt, > YARN-415.201408150030.txt, YARN-415.201408181938.txt, > YARN-415.201408181938.txt, YARN-415.201408212033.txt, > YARN-415.201409040036.txt, YARN-415.201409092204.txt, > YARN-415.201409102216.txt, YARN-415.patch > > > For the purpose of chargeback, I'd like to be able to compute the cost of an > application in terms of cluster resource usage. To start out, I'd like to > get the memory utilization of an application. The unit should be MB-seconds > or something similar and, from a chargeback perspective, the memory amount > should be the memory reserved for the application, as even if the app didn't > use all that memory, no one else was able to use it. > (reserved ram for container 1 * lifetime of container 1) + (reserved ram for > container 2 * lifetime of container 2) + ... + (reserved ram for container n > * lifetime of container n) > It'd be nice to have this at the app level instead of the job level because: > 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't > appear on the job history server). > 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm). > This new metric should be available both through the RM UI and RM Web > Services REST API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-415) Capture memory allocation at the app-level for chargeback
[ https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kendall Thrapp updated YARN-415: Summary: Capture memory allocation at the app-level for chargeback (was: Capture memory utilization at the app-level for chargeback) > Capture memory allocation at the app-level for chargeback > - > > Key: YARN-415 > URL: https://issues.apache.org/jira/browse/YARN-415 > Project: Hadoop YARN > Issue Type: New Feature > Components: resourcemanager >Affects Versions: 2.5.0 >Reporter: Kendall Thrapp >Assignee: Andrey Klochkov > Attachments: YARN-415--n10.patch, YARN-415--n2.patch, > YARN-415--n3.patch, YARN-415--n4.patch, YARN-415--n5.patch, > YARN-415--n6.patch, YARN-415--n7.patch, YARN-415--n8.patch, > YARN-415--n9.patch, YARN-415.201405311749.txt, YARN-415.201406031616.txt, > YARN-415.201406262136.txt, YARN-415.201407042037.txt, > YARN-415.201407071542.txt, YARN-415.201407171553.txt, > YARN-415.201407172144.txt, YARN-415.201407232237.txt, > YARN-415.201407242148.txt, YARN-415.201407281816.txt, > YARN-415.201408062232.txt, YARN-415.201408080204.txt, > YARN-415.201408092006.txt, YARN-415.201408132109.txt, > YARN-415.201408150030.txt, YARN-415.201408181938.txt, > YARN-415.201408181938.txt, YARN-415.patch > > > For the purpose of chargeback, I'd like to be able to compute the cost of an > application in terms of cluster resource usage. To start out, I'd like to > get the memory utilization of an application. The unit should be MB-seconds > or something similar and, from a chargeback perspective, the memory amount > should be the memory reserved for the application, as even if the app didn't > use all that memory, no one else was able to use it. > (reserved ram for container 1 * lifetime of container 1) + (reserved ram for > container 2 * lifetime of container 2) + ... + (reserved ram for container n > * lifetime of container n) > It'd be nice to have this at the app level instead of the job level because: > 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't > appear on the job history server). > 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm). > This new metric should be available both through the RM UI and RM Web > Services REST API. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-415) Capture memory allocation at the app-level for chargeback
[ https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14105569#comment-14105569 ] Kendall Thrapp commented on YARN-415: - Updated the JIRA title to say allocation instead of utilization. > Capture memory allocation at the app-level for chargeback > - > > Key: YARN-415 > URL: https://issues.apache.org/jira/browse/YARN-415 > Project: Hadoop YARN > Issue Type: New Feature > Components: resourcemanager >Affects Versions: 2.5.0 >Reporter: Kendall Thrapp >Assignee: Andrey Klochkov > Attachments: YARN-415--n10.patch, YARN-415--n2.patch, > YARN-415--n3.patch, YARN-415--n4.patch, YARN-415--n5.patch, > YARN-415--n6.patch, YARN-415--n7.patch, YARN-415--n8.patch, > YARN-415--n9.patch, YARN-415.201405311749.txt, YARN-415.201406031616.txt, > YARN-415.201406262136.txt, YARN-415.201407042037.txt, > YARN-415.201407071542.txt, YARN-415.201407171553.txt, > YARN-415.201407172144.txt, YARN-415.201407232237.txt, > YARN-415.201407242148.txt, YARN-415.201407281816.txt, > YARN-415.201408062232.txt, YARN-415.201408080204.txt, > YARN-415.201408092006.txt, YARN-415.201408132109.txt, > YARN-415.201408150030.txt, YARN-415.201408181938.txt, > YARN-415.201408181938.txt, YARN-415.patch > > > For the purpose of chargeback, I'd like to be able to compute the cost of an > application in terms of cluster resource usage. To start out, I'd like to > get the memory utilization of an application. The unit should be MB-seconds > or something similar and, from a chargeback perspective, the memory amount > should be the memory reserved for the application, as even if the app didn't > use all that memory, no one else was able to use it. > (reserved ram for container 1 * lifetime of container 1) + (reserved ram for > container 2 * lifetime of container 2) + ... + (reserved ram for container n > * lifetime of container n) > It'd be nice to have this at the app level instead of the job level because: > 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't > appear on the job history server). > 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm). > This new metric should be available both through the RM UI and RM Web > Services REST API. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-415) Capture memory utilization at the app-level for chargeback
[ https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14101445#comment-14101445 ] Kendall Thrapp commented on YARN-415: - {quote} 1. Is the chargeback simply to track the usage and may be financially charge the users. Or, is to influence future scheduling decisions? I agree that the RM should facilitate collecting this information, but should the collected info be available to the RM for future use? If not, do we want the RM to serve this info? {quote} In addition to the goals [~eepayne] listed, another goal is to make it easier for users to compare how code changes to a particular recurring Hadoop job affect its resource usage. Assuming input data size didn't significantly change, It'd be much more apparent after to the user after a code change if there was a resulting significant change in the resource usage for their job. Even without charging, I'm hoping that having the resource usage shown to the user, without any extra work on their part, will make more people think about their overall grid resource usage, instead of just run times. > Capture memory utilization at the app-level for chargeback > -- > > Key: YARN-415 > URL: https://issues.apache.org/jira/browse/YARN-415 > Project: Hadoop YARN > Issue Type: New Feature > Components: resourcemanager >Affects Versions: 0.23.6 >Reporter: Kendall Thrapp >Assignee: Andrey Klochkov > Attachments: YARN-415--n10.patch, YARN-415--n2.patch, > YARN-415--n3.patch, YARN-415--n4.patch, YARN-415--n5.patch, > YARN-415--n6.patch, YARN-415--n7.patch, YARN-415--n8.patch, > YARN-415--n9.patch, YARN-415.201405311749.txt, YARN-415.201406031616.txt, > YARN-415.201406262136.txt, YARN-415.201407042037.txt, > YARN-415.201407071542.txt, YARN-415.201407171553.txt, > YARN-415.201407172144.txt, YARN-415.201407232237.txt, > YARN-415.201407242148.txt, YARN-415.201407281816.txt, > YARN-415.201408062232.txt, YARN-415.201408080204.txt, > YARN-415.201408092006.txt, YARN-415.201408132109.txt, > YARN-415.201408150030.txt, YARN-415.201408181938.txt, YARN-415.patch > > > For the purpose of chargeback, I'd like to be able to compute the cost of an > application in terms of cluster resource usage. To start out, I'd like to > get the memory utilization of an application. The unit should be MB-seconds > or something similar and, from a chargeback perspective, the memory amount > should be the memory reserved for the application, as even if the app didn't > use all that memory, no one else was able to use it. > (reserved ram for container 1 * lifetime of container 1) + (reserved ram for > container 2 * lifetime of container 2) + ... + (reserved ram for container n > * lifetime of container n) > It'd be nice to have this at the app level instead of the job level because: > 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't > appear on the job history server). > 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm). > This new metric should be available both through the RM UI and RM Web > Services REST API. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-415) Capture memory utilization at the app-level for chargeback
[ https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13978384#comment-13978384 ] Kendall Thrapp commented on YARN-415: - Hi Andrey, any update on this? Thanks! > Capture memory utilization at the app-level for chargeback > -- > > Key: YARN-415 > URL: https://issues.apache.org/jira/browse/YARN-415 > Project: Hadoop YARN > Issue Type: New Feature > Components: resourcemanager >Affects Versions: 0.23.6 >Reporter: Kendall Thrapp >Assignee: Andrey Klochkov > Attachments: YARN-415--n10.patch, YARN-415--n2.patch, > YARN-415--n3.patch, YARN-415--n4.patch, YARN-415--n5.patch, > YARN-415--n6.patch, YARN-415--n7.patch, YARN-415--n8.patch, > YARN-415--n9.patch, YARN-415.patch > > > For the purpose of chargeback, I'd like to be able to compute the cost of an > application in terms of cluster resource usage. To start out, I'd like to > get the memory utilization of an application. The unit should be MB-seconds > or something similar and, from a chargeback perspective, the memory amount > should be the memory reserved for the application, as even if the app didn't > use all that memory, no one else was able to use it. > (reserved ram for container 1 * lifetime of container 1) + (reserved ram for > container 2 * lifetime of container 2) + ... + (reserved ram for container n > * lifetime of container n) > It'd be nice to have this at the app level instead of the job level because: > 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't > appear on the job history server). > 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm). > This new metric should be available both through the RM UI and RM Web > Services REST API. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-415) Capture memory utilization at the app-level for chargeback
[ https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13865739#comment-13865739 ] Kendall Thrapp commented on YARN-415: - Thanks Andrey for all your work on this! I'm looking forward to being able to use this. Any updates? > Capture memory utilization at the app-level for chargeback > -- > > Key: YARN-415 > URL: https://issues.apache.org/jira/browse/YARN-415 > Project: Hadoop YARN > Issue Type: New Feature > Components: resourcemanager >Affects Versions: 0.23.6 >Reporter: Kendall Thrapp >Assignee: Andrey Klochkov > Attachments: YARN-415--n10.patch, YARN-415--n2.patch, > YARN-415--n3.patch, YARN-415--n4.patch, YARN-415--n5.patch, > YARN-415--n6.patch, YARN-415--n7.patch, YARN-415--n8.patch, > YARN-415--n9.patch, YARN-415.patch > > > For the purpose of chargeback, I'd like to be able to compute the cost of an > application in terms of cluster resource usage. To start out, I'd like to > get the memory utilization of an application. The unit should be MB-seconds > or something similar and, from a chargeback perspective, the memory amount > should be the memory reserved for the application, as even if the app didn't > use all that memory, no one else was able to use it. > (reserved ram for container 1 * lifetime of container 1) + (reserved ram for > container 2 * lifetime of container 2) + ... + (reserved ram for container n > * lifetime of container n) > It'd be nice to have this at the app level instead of the job level because: > 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't > appear on the job history server). > 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm). > This new metric should be available both through the RM UI and RM Web > Services REST API. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-415) Capture memory utilization at the app-level for chargeback
[ https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798511#comment-13798511 ] Kendall Thrapp commented on YARN-415: - Thanks Andrey for implementing this. I'm looking forward to being able to use it. Just a reminder to also update the REST API docs (http://hadoop.apache.org/docs/r2.2.0/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Applications_API). > Capture memory utilization at the app-level for chargeback > -- > > Key: YARN-415 > URL: https://issues.apache.org/jira/browse/YARN-415 > Project: Hadoop YARN > Issue Type: New Feature > Components: resourcemanager >Affects Versions: 0.23.6 >Reporter: Kendall Thrapp >Assignee: Andrey Klochkov > Attachments: YARN-415--n2.patch, YARN-415--n3.patch, > YARN-415--n4.patch, YARN-415--n5.patch, YARN-415--n6.patch, > YARN-415--n7.patch, YARN-415.patch > > > For the purpose of chargeback, I'd like to be able to compute the cost of an > application in terms of cluster resource usage. To start out, I'd like to > get the memory utilization of an application. The unit should be MB-seconds > or something similar and, from a chargeback perspective, the memory amount > should be the memory reserved for the application, as even if the app didn't > use all that memory, no one else was able to use it. > (reserved ram for container 1 * lifetime of container 1) + (reserved ram for > container 2 * lifetime of container 2) + ... + (reserved ram for container n > * lifetime of container n) > It'd be nice to have this at the app level instead of the job level because: > 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't > appear on the job history server). > 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm). > This new metric should be available both through the RM UI and RM Web > Services REST API. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (YARN-691) Invalid NaN values in Hadoop REST API JSON response
Kendall Thrapp created YARN-691: --- Summary: Invalid NaN values in Hadoop REST API JSON response Key: YARN-691 URL: https://issues.apache.org/jira/browse/YARN-691 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 0.23.6 Reporter: Kendall Thrapp I've been occasionally coming across instances where Hadoop's Cluster Applications REST API (http://hadoop.apache.org/docs/r0.23.6/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Applications_API) has returned JSON that PHP's json_decode function failed to parse. I've tracked the syntax error down to the presence of the unquoted word NaN appearing as a value in the JSON. For example: "progress":NaN, NaN is not part of the JSON spec, so its presence renders the whole JSON string invalid. Hadoop needs to return something other than NaN in this case -- perhaps an empty string or the quoted string "NaN". -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-462) Project Parameter for Chargeback
[ https://issues.apache.org/jira/browse/YARN-462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13606949#comment-13606949 ] Kendall Thrapp commented on YARN-462: - Karthik, I think your suggestion for transparent project queues under the leaf queues is an interesting idea and would also meet my requirements. > Project Parameter for Chargeback > > > Key: YARN-462 > URL: https://issues.apache.org/jira/browse/YARN-462 > Project: Hadoop YARN > Issue Type: New Feature > Components: resourcemanager >Affects Versions: 0.23.6 >Reporter: Kendall Thrapp > > Problem Summary > For the purpose of chargeback and better understanding of grid usage, we need > to be able to associate applications with "projects", e.g. "pipeline X", > "property Y". This would allow us to aggregate on this property, thereby > helping us compute grid resource usage for the entire "project". Currently, > for a given application, two things we know about it are the user that > submitted it and the queue it was submitted to. Below, I'll explain why > neither of these is adequate for enterprise-level chargeback and > understanding resource allocation needs. > Why Not Users? > Its not individual users that are paying the bill -- its projects. When one > of our real users submits an application on a Hadoop grid, they're presumably > not usually doing it for themselves. They're doing work for some project or > team effort, so its that team or project that should be "charged" for all its > users applications. Maintaining outside lists of associations between users > and projects is error-prone because it is time-sensitive and requires > continued ongoing maintenance. New users join organizations, users leave and > users even change projects. Furthermore, users may split their time between > multiple projects, making it ambiguous as to which of a user's projects a > given application should be charged. Also, there can be headless users, > which can be even more difficult to link to a project and can be shared > between teams or projects. > Why Not Queues? > The purpose of queues is for scheduling. Overloading the queues concept to > also mean who should be "charged" for an application can have a detrimental > effect on the primary purpose of queues. It could be manageable in the case > of a very small number of projects sharing a cluster, but doesn't scale to > tens or hundreds of projects sharing a cluster. If a given cluster is shared > between 50 projects, creating 50 separate queues will result in inefficient > use of the cluster resources. Furthermore, a given project may desire more > than one queue for different types or priorities of applications. > Proposed Solution > Rather than relying on external tools to infer through the user and/or queue > who to "charge" for a given application, I propose a straightforward approach > where that information be explicitly supplied when the application is > submitted, just like we do with queues. Let's use a charge card analogy: > when you buy something online, you don't just say who you are and how to ship > it, you also specify how you're paying for it. Similarly, when submitting an > application in YARN, you could explicitly specify to whom it's resource usage > should be associated (a project, team, cost center, etc). > This new configuration parameter should default to being optional, so that > organizations not interested in chargeback or project-level resource tracking > can happily continue on as if it wasn't there. However, it should be > configurable at the cluster-level such that, a given cluster to could elect > to make it required, so that all applications would have an associated > project. The value of this new parameter should be exposed via the Resource > Manager UI and Resource Manager REST API, so that users and tools can make > use of it for chargeback, utilization metrics, etc. > I'm undecided on what to name the new parameter, as I like the flexibility in > the ways it could be used. It is essentially just an additional party other > than user or queue that an application can be associated with, so its use is > not just limited to a chargeback scenario. For example, an organization not > interested in chargeback could still use this parameter to communicate useful > information about a application (e.g. pipelineX.stageN) and aggregate like > applications. > Enforcement > Couldn't users just specify this information as a prefix for their job names? > Yes, but the missing piece this could provides is enforcement. Ideally, I'd > like this parameter to work very much like how the queues work. Like already > exists with queues, it'd be ideal if a given user couldn't just specify any > old value for this pa
[jira] [Updated] (YARN-473) Capacity Scheduler webpage and REST API not showing correct number of pending applications
[ https://issues.apache.org/jira/browse/YARN-473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kendall Thrapp updated YARN-473: Description: The Capacity Scheduler REST API (http://hadoop.apache.org/docs/r0.23.6/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Scheduler_API) is not returning the correct number of pending applications. numPendingApplications is almost always zero, even if there are dozens of pending apps. In investigating this, I discovered that the Resource Manager's Scheduler webpage is also showing an incorrect but different number of pending applications. For example, the cluster I'm looking at right now currently has 15 applications in the ACCEPTED state, but the Cluster Metrics table near the top of the page says there are only 2 pending apps. The REST API says there are zero pending apps. was: The Capacity Scheduler REST API (http://hadoop.apache.org/docs/r0.23.6/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Scheduler_API) is not returning the correct number of pending applications. numPendingApplications is almost always zero, even if there are dozens of pending apps. In investigating this, I discovered that the Resource Manager's Scheduler webpage is als showing an incorrect but different number of pending applications. For example, the cluster I'm looking at right now currently has 15 applications in the ACCEPTED state, but the Cluster Metrics table near the top of the page says there are only 2 pending apps. The REST API says there are zero pending apps. > Capacity Scheduler webpage and REST API not showing correct number of pending > applications > -- > > Key: YARN-473 > URL: https://issues.apache.org/jira/browse/YARN-473 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 0.23.6 >Reporter: Kendall Thrapp > > The Capacity Scheduler REST API > (http://hadoop.apache.org/docs/r0.23.6/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Scheduler_API) > is not returning the correct number of pending applications. > numPendingApplications is almost always zero, even if there are dozens of > pending apps. > In investigating this, I discovered that the Resource Manager's Scheduler > webpage is also showing an incorrect but different number of pending > applications. For example, the cluster I'm looking at right now currently > has 15 applications in the ACCEPTED state, but the Cluster Metrics table near > the top of the page says there are only 2 pending apps. The REST API says > there are zero pending apps. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-473) Capacity Scheduler webpage and REST API not showing correct number of pending applications
Kendall Thrapp created YARN-473: --- Summary: Capacity Scheduler webpage and REST API not showing correct number of pending applications Key: YARN-473 URL: https://issues.apache.org/jira/browse/YARN-473 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 0.23.6 Reporter: Kendall Thrapp The Capacity Scheduler REST API (http://hadoop.apache.org/docs/r0.23.6/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Scheduler_API) is not returning the correct number of pending applications. numPendingApplications is almost always zero, even if there are dozens of pending apps. In investigating this, I discovered that the Resource Manager's Scheduler webpage is als showing an incorrect but different number of pending applications. For example, the cluster I'm looking at right now currently has 15 applications in the ACCEPTED state, but the Cluster Metrics table near the top of the page says there are only 2 pending apps. The REST API says there are zero pending apps. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-462) Project Parameter for Chargeback
[ https://issues.apache.org/jira/browse/YARN-462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13601258#comment-13601258 ] Kendall Thrapp commented on YARN-462: - And yes, the case where entity A (real or headless user) is part of two other entities (teams or projects) B and C and submits jobs to both queues is one of the tricky issues I'm hoping to solve. Another case is where last week user A was part of team B, but this week is now part of team C, and not wanting any ambiguity in attributing user A's resource usage to the correct team, no matter what day's metrics I'm looking at. In large enough organizations, that's not necessarily a rare occurrence. > Project Parameter for Chargeback > > > Key: YARN-462 > URL: https://issues.apache.org/jira/browse/YARN-462 > Project: Hadoop YARN > Issue Type: New Feature > Components: resourcemanager >Affects Versions: 0.23.6 >Reporter: Kendall Thrapp > > Problem Summary > For the purpose of chargeback and better understanding of grid usage, we need > to be able to associate applications with "projects", e.g. "pipeline X", > "property Y". This would allow us to aggregate on this property, thereby > helping us compute grid resource usage for the entire "project". Currently, > for a given application, two things we know about it are the user that > submitted it and the queue it was submitted to. Below, I'll explain why > neither of these is adequate for enterprise-level chargeback and > understanding resource allocation needs. > Why Not Users? > Its not individual users that are paying the bill -- its projects. When one > of our real users submits an application on a Hadoop grid, they're presumably > not usually doing it for themselves. They're doing work for some project or > team effort, so its that team or project that should be "charged" for all its > users applications. Maintaining outside lists of associations between users > and projects is error-prone because it is time-sensitive and requires > continued ongoing maintenance. New users join organizations, users leave and > users even change projects. Furthermore, users may split their time between > multiple projects, making it ambiguous as to which of a user's projects a > given application should be charged. Also, there can be headless users, > which can be even more difficult to link to a project and can be shared > between teams or projects. > Why Not Queues? > The purpose of queues is for scheduling. Overloading the queues concept to > also mean who should be "charged" for an application can have a detrimental > effect on the primary purpose of queues. It could be manageable in the case > of a very small number of projects sharing a cluster, but doesn't scale to > tens or hundreds of projects sharing a cluster. If a given cluster is shared > between 50 projects, creating 50 separate queues will result in inefficient > use of the cluster resources. Furthermore, a given project may desire more > than one queue for different types or priorities of applications. > Proposed Solution > Rather than relying on external tools to infer through the user and/or queue > who to "charge" for a given application, I propose a straightforward approach > where that information be explicitly supplied when the application is > submitted, just like we do with queues. Let's use a charge card analogy: > when you buy something online, you don't just say who you are and how to ship > it, you also specify how you're paying for it. Similarly, when submitting an > application in YARN, you could explicitly specify to whom it's resource usage > should be associated (a project, team, cost center, etc). > This new configuration parameter should default to being optional, so that > organizations not interested in chargeback or project-level resource tracking > can happily continue on as if it wasn't there. However, it should be > configurable at the cluster-level such that, a given cluster to could elect > to make it required, so that all applications would have an associated > project. The value of this new parameter should be exposed via the Resource > Manager UI and Resource Manager REST API, so that users and tools can make > use of it for chargeback, utilization metrics, etc. > I'm undecided on what to name the new parameter, as I like the flexibility in > the ways it could be used. It is essentially just an additional party other > than user or queue that an application can be associated with, so its use is > not just limited to a chargeback scenario. For example, an organization not > interested in chargeback could still use this parameter to communicate useful > information about a application (e.g. pipelineX.stageN) and aggregate like > applicatio
[jira] [Commented] (YARN-462) Project Parameter for Chargeback
[ https://issues.apache.org/jira/browse/YARN-462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13601245#comment-13601245 ] Kendall Thrapp commented on YARN-462: - Thanks for the questions and feedback. Yes, first I should clarify what I intended by chargeback. I'm looking to be able quantify cluster resource usage (memory, CPU, HDFS, etc.) for every application, and then roll that up to the project level. This would allow us to accurately charge the customer (i.e. team/project) for their grid usage (either literally or just informatively). I want to provide incentive for more efficient coding, as well as make it easier for teams to compare their resource usage across different software versions of their Hadoop applications, config parameter changes, etc. I had originally hoped that hierarchical queues could serve this purpose as well, but have since run into several issues with this approach. The first is that it doesn't scale for clusters with large numbers of projects. I've seen large clusters shared between over a hundred different projects, each with their own teams of users. If I recall correctly, queues can't be assigned less than 1% of the total capacity, so it wouldn't be possible to give each of these project their own queue. Even if we could, I suspect this could result in too much overhead for the scheduler and too much fragmentation of the cluster resources, which could result in poorer overall utilization. The second issue is that the project-per-queue approach conflicts with how I see users wanting to use our queues. In many cases I see queues being used to distinguish application priorities, ensuring that high priority time-sensitive jobs get the resources they need to finish on time, while big but lower priority and less time-sensitive jobs are constrained by being in a smaller queue. I'd expect a lot of pushback from our users for any chargeback-focused queue configuration that had a negative impact on job run times and meeting SLAs. The idea of the project/chargeback parameter decouples the two. > Project Parameter for Chargeback > > > Key: YARN-462 > URL: https://issues.apache.org/jira/browse/YARN-462 > Project: Hadoop YARN > Issue Type: New Feature > Components: resourcemanager >Affects Versions: 0.23.6 >Reporter: Kendall Thrapp > > Problem Summary > For the purpose of chargeback and better understanding of grid usage, we need > to be able to associate applications with "projects", e.g. "pipeline X", > "property Y". This would allow us to aggregate on this property, thereby > helping us compute grid resource usage for the entire "project". Currently, > for a given application, two things we know about it are the user that > submitted it and the queue it was submitted to. Below, I'll explain why > neither of these is adequate for enterprise-level chargeback and > understanding resource allocation needs. > Why Not Users? > Its not individual users that are paying the bill -- its projects. When one > of our real users submits an application on a Hadoop grid, they're presumably > not usually doing it for themselves. They're doing work for some project or > team effort, so its that team or project that should be "charged" for all its > users applications. Maintaining outside lists of associations between users > and projects is error-prone because it is time-sensitive and requires > continued ongoing maintenance. New users join organizations, users leave and > users even change projects. Furthermore, users may split their time between > multiple projects, making it ambiguous as to which of a user's projects a > given application should be charged. Also, there can be headless users, > which can be even more difficult to link to a project and can be shared > between teams or projects. > Why Not Queues? > The purpose of queues is for scheduling. Overloading the queues concept to > also mean who should be "charged" for an application can have a detrimental > effect on the primary purpose of queues. It could be manageable in the case > of a very small number of projects sharing a cluster, but doesn't scale to > tens or hundreds of projects sharing a cluster. If a given cluster is shared > between 50 projects, creating 50 separate queues will result in inefficient > use of the cluster resources. Furthermore, a given project may desire more > than one queue for different types or priorities of applications. > Proposed Solution > Rather than relying on external tools to infer through the user and/or queue > who to "charge" for a given application, I propose a straightforward approach > where that information be explicitly supplied when the application is > submitted, just like we do with queues. Let's use a cha
[jira] [Updated] (YARN-462) Project Parameter for Chargeback
[ https://issues.apache.org/jira/browse/YARN-462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kendall Thrapp updated YARN-462: Issue Type: New Feature (was: Improvement) > Project Parameter for Chargeback > > > Key: YARN-462 > URL: https://issues.apache.org/jira/browse/YARN-462 > Project: Hadoop YARN > Issue Type: New Feature > Components: resourcemanager >Affects Versions: 0.23.6 >Reporter: Kendall Thrapp > > Problem Summary > For the purpose of chargeback and better understanding of grid usage, we need > to be able to associate applications with "projects", e.g. "pipeline X", > "property Y". This would allow us to aggregate on this property, thereby > helping us compute grid resource usage for the entire "project". Currently, > for a given application, two things we know about it are the user that > submitted it and the queue it was submitted to. Below, I'll explain why > neither of these is adequate for enterprise-level chargeback and > understanding resource allocation needs. > Why Not Users? > Its not individual users that are paying the bill -- its projects. When one > of our real users submits an application on a Hadoop grid, they're presumably > not usually doing it for themselves. They're doing work for some project or > team effort, so its that team or project that should be "charged" for all its > users applications. Maintaining outside lists of associations between users > and projects is error-prone because it is time-sensitive and requires > continued ongoing maintenance. New users join organizations, users leave and > users even change projects. Furthermore, users may split their time between > multiple projects, making it ambiguous as to which of a user's projects a > given application should be charged. Also, there can be headless users, > which can be even more difficult to link to a project and can be shared > between teams or projects. > Why Not Queues? > The purpose of queues is for scheduling. Overloading the queues concept to > also mean who should be "charged" for an application can have a detrimental > effect on the primary purpose of queues. It could be manageable in the case > of a very small number of projects sharing a cluster, but doesn't scale to > tens or hundreds of projects sharing a cluster. If a given cluster is shared > between 50 projects, creating 50 separate queues will result in inefficient > use of the cluster resources. Furthermore, a given project may desire more > than one queue for different types or priorities of applications. > Proposed Solution > Rather than relying on external tools to infer through the user and/or queue > who to "charge" for a given application, I propose a straightforward approach > where that information be explicitly supplied when the application is > submitted, just like we do with queues. Let's use a charge card analogy: > when you buy something online, you don't just say who you are and how to ship > it, you also specify how you're paying for it. Similarly, when submitting an > application in YARN, you could explicitly specify to whom it's resource usage > should be associated (a project, team, cost center, etc). > This new configuration parameter should default to being optional, so that > organizations not interested in chargeback or project-level resource tracking > can happily continue on as if it wasn't there. However, it should be > configurable at the cluster-level such that, a given cluster to could elect > to make it required, so that all applications would have an associated > project. The value of this new parameter should be exposed via the Resource > Manager UI and Resource Manager REST API, so that users and tools can make > use of it for chargeback, utilization metrics, etc. > I'm undecided on what to name the new parameter, as I like the flexibility in > the ways it could be used. It is essentially just an additional party other > than user or queue that an application can be associated with, so its use is > not just limited to a chargeback scenario. For example, an organization not > interested in chargeback could still use this parameter to communicate useful > information about a application (e.g. pipelineX.stageN) and aggregate like > applications. > Enforcement > Couldn't users just specify this information as a prefix for their job names? > Yes, but the missing piece this could provides is enforcement. Ideally, I'd > like this parameter to work very much like how the queues work. Like already > exists with queues, it'd be ideal if a given user couldn't just specify any > old value for this parameter. It could be configurable such that a given > user only has permission to submit applications for specific "projects". > Submitting an application with this
[jira] [Created] (YARN-462) Project Parameter for Chargeback
Kendall Thrapp created YARN-462: --- Summary: Project Parameter for Chargeback Key: YARN-462 URL: https://issues.apache.org/jira/browse/YARN-462 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 0.23.6 Reporter: Kendall Thrapp Problem Summary For the purpose of chargeback and better understanding of grid usage, we need to be able to associate applications with "projects", e.g. "pipeline X", "property Y". This would allow us to aggregate on this property, thereby helping us compute grid resource usage for the entire "project". Currently, for a given application, two things we know about it are the user that submitted it and the queue it was submitted to. Below, I'll explain why neither of these is adequate for enterprise-level chargeback and understanding resource allocation needs. Why Not Users? Its not individual users that are paying the bill -- its projects. When one of our real users submits an application on a Hadoop grid, they're presumably not usually doing it for themselves. They're doing work for some project or team effort, so its that team or project that should be "charged" for all its users applications. Maintaining outside lists of associations between users and projects is error-prone because it is time-sensitive and requires continued ongoing maintenance. New users join organizations, users leave and users even change projects. Furthermore, users may split their time between multiple projects, making it ambiguous as to which of a user's projects a given application should be charged. Also, there can be headless users, which can be even more difficult to link to a project and can be shared between teams or projects. Why Not Queues? The purpose of queues is for scheduling. Overloading the queues concept to also mean who should be "charged" for an application can have a detrimental effect on the primary purpose of queues. It could be manageable in the case of a very small number of projects sharing a cluster, but doesn't scale to tens or hundreds of projects sharing a cluster. If a given cluster is shared between 50 projects, creating 50 separate queues will result in inefficient use of the cluster resources. Furthermore, a given project may desire more than one queue for different types or priorities of applications. Proposed Solution Rather than relying on external tools to infer through the user and/or queue who to "charge" for a given application, I propose a straightforward approach where that information be explicitly supplied when the application is submitted, just like we do with queues. Let's use a charge card analogy: when you buy something online, you don't just say who you are and how to ship it, you also specify how you're paying for it. Similarly, when submitting an application in YARN, you could explicitly specify to whom it's resource usage should be associated (a project, team, cost center, etc). This new configuration parameter should default to being optional, so that organizations not interested in chargeback or project-level resource tracking can happily continue on as if it wasn't there. However, it should be configurable at the cluster-level such that, a given cluster to could elect to make it required, so that all applications would have an associated project. The value of this new parameter should be exposed via the Resource Manager UI and Resource Manager REST API, so that users and tools can make use of it for chargeback, utilization metrics, etc. I'm undecided on what to name the new parameter, as I like the flexibility in the ways it could be used. It is essentially just an additional party other than user or queue that an application can be associated with, so its use is not just limited to a chargeback scenario. For example, an organization not interested in chargeback could still use this parameter to communicate useful information about a application (e.g. pipelineX.stageN) and aggregate like applications. Enforcement Couldn't users just specify this information as a prefix for their job names? Yes, but the missing piece this could provides is enforcement. Ideally, I'd like this parameter to work very much like how the queues work. Like already exists with queues, it'd be ideal if a given user couldn't just specify any old value for this parameter. It could be configurable such that a given user only has permission to submit applications for specific "projects". Submitting an application with this parameter being anything other than what the given user is allowed, would cause the application to be rejected in the same manner as if the user has specified an invalid queue. Again, so as to have no effect on organizations not interested in this feature, this enforcement should be off by default, but config
[jira] [Created] (YARN-415) Capture memory utilization at the app-level for chargeback
Kendall Thrapp created YARN-415: --- Summary: Capture memory utilization at the app-level for chargeback Key: YARN-415 URL: https://issues.apache.org/jira/browse/YARN-415 Project: Hadoop YARN Issue Type: New Feature Components: resourcemanager Affects Versions: 0.23.6 Reporter: Kendall Thrapp For the purpose of chargeback, I'd like to be able to compute the cost of an application in terms of cluster resource usage. To start out, I'd like to get the memory utilization of an application. The unit should be MB-seconds or something similar and, from a chargeback perspective, the memory amount should be the memory reserved for the application, as even if the app didn't use all that memory, no one else was able to use it. (reserved ram for container 1 * lifetime of container 1) + (reserved ram for container 2 * lifetime of container 2) + ... + (reserved ram for container n * lifetime of container n) It'd be nice to have this at the app level instead of the job level because: 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't appear on the job history server). 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm). This new metric should be available both through the RM UI and RM Web Services REST API. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira