[jira] [Commented] (YARN-3332) [Umbrella] Unified Resource Statistics Collection per node
[ https://issues.apache.org/jira/browse/YARN-3332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16196264#comment-16196264 ] Yufei Gu commented on YARN-3332: Is this done by ATSv2? cc [~haibo.chen] > [Umbrella] Unified Resource Statistics Collection per node > -- > > Key: YARN-3332 > URL: https://issues.apache.org/jira/browse/YARN-3332 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli > Attachments: Design - UnifiedResourceStatisticsCollection.pdf > > > Today in YARN, NodeManager collects statistics like per container resource > usage and overall physical resources available on the machine. Currently this > is used internally in YARN by the NodeManager for only a limited usage: > automatically determining the capacity of resources on node and enforcing > memory usage to what is reserved per container. > This proposal is to extend the existing architecture and collect statistics > for usage beyond the existing usecases. > Proposal attached in comments. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-3332) [Umbrella] Unified Resource Statistics Collection per node
[ https://issues.apache.org/jira/browse/YARN-3332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14605966#comment-14605966 ] Allen Wittenauer commented on YARN-3332: Why is this a YARN JIRA and not in HADOOP? > [Umbrella] Unified Resource Statistics Collection per node > -- > > Key: YARN-3332 > URL: https://issues.apache.org/jira/browse/YARN-3332 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli > Attachments: Design - UnifiedResourceStatisticsCollection.pdf > > > Today in YARN, NodeManager collects statistics like per container resource > usage and overall physical resources available on the machine. Currently this > is used internally in YARN by the NodeManager for only a limited usage: > automatically determining the capacity of resources on node and enforcing > memory usage to what is reserved per container. > This proposal is to extend the existing architecture and collect statistics > for usage beyond the existing usecases. > Proposal attached in comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3332) [Umbrella] Unified Resource Statistics Collection per node
[ https://issues.apache.org/jira/browse/YARN-3332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14522774#comment-14522774 ] Vinod Kumar Vavilapalli commented on YARN-3332: --- Unfortunately, other pieces starting moving in sooner than I could start on this: YARN-3534 (in progress), YARN-3334 (part of Timeline service next-gen YARN-2928). So I am planning to do a refactor once those two go into trunk. Tx for offering involvement, once they go in, I can file sub-tasks for moving forward. > [Umbrella] Unified Resource Statistics Collection per node > -- > > Key: YARN-3332 > URL: https://issues.apache.org/jira/browse/YARN-3332 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli > Attachments: Design - UnifiedResourceStatisticsCollection.pdf > > > Today in YARN, NodeManager collects statistics like per container resource > usage and overall physical resources available on the machine. Currently this > is used internally in YARN by the NodeManager for only a limited usage: > automatically determining the capacity of resources on node and enforcing > memory usage to what is reserved per container. > This proposal is to extend the existing architecture and collect statistics > for usage beyond the existing usecases. > Proposal attached in comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3332) [Umbrella] Unified Resource Statistics Collection per node
[ https://issues.apache.org/jira/browse/YARN-3332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14521926#comment-14521926 ] Karthik Kambatla commented on YARN-3332: [~vinodkv] - did you start implementing this? I would like to be involved in the work here - either implementing parts of it or reviewing most of it. > [Umbrella] Unified Resource Statistics Collection per node > -- > > Key: YARN-3332 > URL: https://issues.apache.org/jira/browse/YARN-3332 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli > Attachments: Design - UnifiedResourceStatisticsCollection.pdf > > > Today in YARN, NodeManager collects statistics like per container resource > usage and overall physical resources available on the machine. Currently this > is used internally in YARN by the NodeManager for only a limited usage: > automatically determining the capacity of resources on node and enforcing > memory usage to what is reserved per container. > This proposal is to extend the existing architecture and collect statistics > for usage beyond the existing usecases. > Proposal attached in comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3332) [Umbrella] Unified Resource Statistics Collection per node
[ https://issues.apache.org/jira/browse/YARN-3332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14356418#comment-14356418 ] Karthik Kambatla commented on YARN-3332: bq. the machine level big picture is fragmented between YARN and HDFS (and HBase etc) What constitutes the machine level big picture? Isn't this just the overall node's resource usage? YARN, at least as of today, doesn't need to know about the usage stats of HDFS or HBase. I have nothing against going the server route, except the additional daemon one might end up having to run. bq. I anyways needed a service to expose an API for both admins/users as well as external systems beyond HDFS too - I can imagine tools being built on top of this. It is not as clear to me. Let us say an admin and a user want usage stats about their YARN containers. The service can only provide the usage stats, while YARN will be able to provide other container metadata. Also, we should consider privacy of usage information. Will auth against this new service be additional overhead? bq. That said, it doesn't need to be service or library. I can think of a library that wires into the exposed API, though I haven't found uses for that yet. Sorry, didn't get that. Can you clarify/ elaborate? > [Umbrella] Unified Resource Statistics Collection per node > -- > > Key: YARN-3332 > URL: https://issues.apache.org/jira/browse/YARN-3332 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli > Attachments: Design - UnifiedResourceStatisticsCollection.pdf > > > Today in YARN, NodeManager collects statistics like per container resource > usage and overall physical resources available on the machine. Currently this > is used internally in YARN by the NodeManager for only a limited usage: > automatically determining the capacity of resources on node and enforcing > memory usage to what is reserved per container. > This proposal is to extend the existing architecture and collect statistics > for usage beyond the existing usecases. > Proposal attached in comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3332) [Umbrella] Unified Resource Statistics Collection per node
[ https://issues.apache.org/jira/browse/YARN-3332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14356309#comment-14356309 ] Zhijie Shen commented on YARN-3332: --- It sounds a great proposal, thanks Vinod! I quick thought about the publishing channel of the collected statistics. I'm not sure how different the access pattern would be, but just thinking it out loudly, is it possible reuse the timeline service to distribute the node statistics, getting rid of maintaining different but similar interfaces (or multiple data flow channels). On step further, we can make the timeline service the main bus to transmit metrics from A to B. > [Umbrella] Unified Resource Statistics Collection per node > -- > > Key: YARN-3332 > URL: https://issues.apache.org/jira/browse/YARN-3332 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli > Attachments: Design - UnifiedResourceStatisticsCollection.pdf > > > Today in YARN, NodeManager collects statistics like per container resource > usage and overall physical resources available on the machine. Currently this > is used internally in YARN by the NodeManager for only a limited usage: > automatically determining the capacity of resources on node and enforcing > memory usage to what is reserved per container. > This proposal is to extend the existing architecture and collect statistics > for usage beyond the existing usecases. > Proposal attached in comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3332) [Umbrella] Unified Resource Statistics Collection per node
[ https://issues.apache.org/jira/browse/YARN-3332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14356260#comment-14356260 ] Vinod Kumar Vavilapalli commented on YARN-3332: --- Chose the service model because the machine level big picture is fragmented between YARN and HDFS (and HBase etc) - having a lower level common statistics layer is useful. I anyways needed a service to expose an API for both admins/users as well as external systems beyond HDFS too - I can imagine tools being built on top of this. That said, it doesn't need to be service or library. I can think of a library that wires into the exposed API, though I haven't found uses for that yet. > [Umbrella] Unified Resource Statistics Collection per node > -- > > Key: YARN-3332 > URL: https://issues.apache.org/jira/browse/YARN-3332 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli > Attachments: Design - UnifiedResourceStatisticsCollection.pdf > > > Today in YARN, NodeManager collects statistics like per container resource > usage and overall physical resources available on the machine. Currently this > is used internally in YARN by the NodeManager for only a limited usage: > automatically determining the capacity of resources on node and enforcing > memory usage to what is reserved per container. > This proposal is to extend the existing architecture and collect statistics > for usage beyond the existing usecases. > Proposal attached in comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3332) [Umbrella] Unified Resource Statistics Collection per node
[ https://issues.apache.org/jira/browse/YARN-3332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14356261#comment-14356261 ] Vinod Kumar Vavilapalli commented on YARN-3332: --- Agreed, this should be entirely possible. > [Umbrella] Unified Resource Statistics Collection per node > -- > > Key: YARN-3332 > URL: https://issues.apache.org/jira/browse/YARN-3332 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli > Attachments: Design - UnifiedResourceStatisticsCollection.pdf > > > Today in YARN, NodeManager collects statistics like per container resource > usage and overall physical resources available on the machine. Currently this > is used internally in YARN by the NodeManager for only a limited usage: > automatically determining the capacity of resources on node and enforcing > memory usage to what is reserved per container. > This proposal is to extend the existing architecture and collect statistics > for usage beyond the existing usecases. > Proposal attached in comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3332) [Umbrella] Unified Resource Statistics Collection per node
[ https://issues.apache.org/jira/browse/YARN-3332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355923#comment-14355923 ] Lei Guo commented on YARN-3332: --- To support customized resources, a quick list about areas we need consider - resource definition, how NM/RM to understand the resource, this should be considered as Metrics based - plug-in framework in NM/agent, * interface for passing resource information between the plug-in and agent, this could be another RPC interface, so the plug-in can be based on any language * interface for loading/trigger plug-in (optional), the reason this interface as optional because the plug-in could be easy as cron job - Sample resource collection plug-in for specific resource (or resource set), this could be some script or Java class depending on the plug-in framework design - communication protocol between RM/NM to support customized resource This topic is related to our proposal in June Hadoop Summit on multiple dimension scheduling. > [Umbrella] Unified Resource Statistics Collection per node > -- > > Key: YARN-3332 > URL: https://issues.apache.org/jira/browse/YARN-3332 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli > Attachments: Design - UnifiedResourceStatisticsCollection.pdf > > > Today in YARN, NodeManager collects statistics like per container resource > usage and overall physical resources available on the machine. Currently this > is used internally in YARN by the NodeManager for only a limited usage: > automatically determining the capacity of resources on node and enforcing > memory usage to what is reserved per container. > This proposal is to extend the existing architecture and collect statistics > for usage beyond the existing usecases. > Proposal attached in comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3332) [Umbrella] Unified Resource Statistics Collection per node
[ https://issues.apache.org/jira/browse/YARN-3332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355901#comment-14355901 ] Li Lu commented on YARN-3332: - Hi [~grey], I think it's a nice idea. I think after YARN-2928, the timeline service layer would support this kind of usage (we're supporting "metrics" as a generic concept). What we need to do under this JIRA is to make the interface available on the NM level, I think? BTW, it would be cool to have GPU metrics. But I'm not sure if there are any general ways to gather this information. Would be helpful if you could elaborate a little bit more (if that's related to this JIRA). Thanks! > [Umbrella] Unified Resource Statistics Collection per node > -- > > Key: YARN-3332 > URL: https://issues.apache.org/jira/browse/YARN-3332 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli > Attachments: Design - UnifiedResourceStatisticsCollection.pdf > > > Today in YARN, NodeManager collects statistics like per container resource > usage and overall physical resources available on the machine. Currently this > is used internally in YARN by the NodeManager for only a limited usage: > automatically determining the capacity of resources on node and enforcing > memory usage to what is reserved per container. > This proposal is to extend the existing architecture and collect statistics > for usage beyond the existing usecases. > Proposal attached in comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3332) [Umbrella] Unified Resource Statistics Collection per node
[ https://issues.apache.org/jira/browse/YARN-3332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355891#comment-14355891 ] Lei Guo commented on YARN-3332: --- Any consideration to support plug-in for customized resource statistics collection in NM? We may need other type resource information for scheduling purpose later, for example, GPU related information. > [Umbrella] Unified Resource Statistics Collection per node > -- > > Key: YARN-3332 > URL: https://issues.apache.org/jira/browse/YARN-3332 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli > Attachments: Design - UnifiedResourceStatisticsCollection.pdf > > > Today in YARN, NodeManager collects statistics like per container resource > usage and overall physical resources available on the machine. Currently this > is used internally in YARN by the NodeManager for only a limited usage: > automatically determining the capacity of resources on node and enforcing > memory usage to what is reserved per container. > This proposal is to extend the existing architecture and collect statistics > for usage beyond the existing usecases. > Proposal attached in comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3332) [Umbrella] Unified Resource Statistics Collection per node
[ https://issues.apache.org/jira/browse/YARN-3332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355798#comment-14355798 ] Karthik Kambatla commented on YARN-3332: Thanks for filing this and working on the design, Vinod. I like the idea of a clean interface to get node and container resource usage info. Is there any reason why you think a service architecture is better than it being a common library? How much information is shared among the consumers of this interface? For instance, both HDFS and YARN would be interested in the availability and usage of CPU, memory, disk and network for the entire node. Isn't all other information of exclusive interest either? Have other questions/comments on the design, but will hold off until we decide on service vs library. > [Umbrella] Unified Resource Statistics Collection per node > -- > > Key: YARN-3332 > URL: https://issues.apache.org/jira/browse/YARN-3332 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli > Attachments: Design - UnifiedResourceStatisticsCollection.pdf > > > Today in YARN, NodeManager collects statistics like per container resource > usage and overall physical resources available on the machine. Currently this > is used internally in YARN by the NodeManager for only a limited usage: > automatically determining the capacity of resources on node and enforcing > memory usage to what is reserved per container. > This proposal is to extend the existing architecture and collect statistics > for usage beyond the existing usecases. > Proposal attached in comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3332) [Umbrella] Unified Resource Statistics Collection per node
[ https://issues.apache.org/jira/browse/YARN-3332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355583#comment-14355583 ] Vinod Kumar Vavilapalli commented on YARN-3332: --- Linking related tickets that can leverage this: YARN-2928, YARN-2745. > [Umbrella] Unified Resource Statistics Collection per node > -- > > Key: YARN-3332 > URL: https://issues.apache.org/jira/browse/YARN-3332 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli > Attachments: Design - UnifiedResourceStatisticsCollection.pdf > > > Today in YARN, NodeManager collects statistics like per container resource > usage and overall physical resources available on the machine. Currently this > is used internally in YARN by the NodeManager for only a limited usage: > automatically determining the capacity of resources on node and enforcing > memory usage to what is reserved per container. > This proposal is to extend the existing architecture and collect statistics > for usage beyond the existing usecases. > Proposal attached in comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)