Container resource usage has been put into ATS v2 metrics system. But if you do not want heavy ATS v2 subsystem, then i am not sure any of the current interface exposing the actual resource usage of the container which solves your problem. Probably i can think of extending this feature in *ContainerManagementProtocol.getContainerStatuses, *so that atleast AM can be aware of the actual container resource usages. Thoughts ?
On Thu, Jun 15, 2017 at 7:29 PM, Sunil G <sun...@apache.org> wrote: > And adding to that, we have aggregated container usage per node. I dont > think you ll have a per container real memory usage recorded from YARN. > You ll have these 2 entries in ideal cases. > > Resource Utilization by Node : > Resource Utilization by Containers : PMem:0 MB, VMem:0 MB, VCores:0.0 > > Thanks > Sunil > > On Thu, Jun 15, 2017 at 6:56 AM Sunil G <sun...@apache.org> wrote: > >> Hi Shmuel >> >> This feature is available in Hadoop 2.8 + release lines. Or Hadoop 3 >> alpha's. >> >> Thanks >> Sunil >> >> On Wed, Jun 14, 2017 at 6:31 AM Shmuel Blitz <shmuel.bl...@similarweb.com> >> wrote: >> >>> Hi Sunil, >>> >>> Thanks for your response. >>> >>> Here is the response I get when running "yarn node -status {nodeId}" : >>> >>> Node Report : >>> Node-Id : myNode:4545 >>> Rack : /default >>> Node-State : RUNNING >>> Node-Http-Address : muNode:8042 >>> Last-Health-Update : Wed 14/Jun/17 08:25:43:261EST >>> Health-Report : >>> Containers : 7 >>> Memory-Used : 44032MB >>> Memory-Capacity : 49152MB >>> CPU-Used : 16 vcores >>> CPU-Capacity : 48 vcores >>> Node-Labels : >>> >>> However, this is information regarding the entire node, containing all >>> containers. >>> >>> I have no way of using this to see the value I give to ' >>> spark.executor.memory' makes sense or not. >>> >>> I'm looking for memory usage/allocated information *per-container*. >>> >>> Shmuel >>> >>> On Wed, Jun 14, 2017 at 4:04 PM, Sunil G <sun...@apache.org> wrote: >>> >>>> Hi Shmuel >>>> >>>> In Hadoop 2.8 release line, you could check "yarn node -status >>>> {nodeId}" CLI command or "http://<rm http >>>> address:port>/ws/v1/cluster/nodes/{nodeid}" >>>> REST end point to get container's actual resource usage per node. You could >>>> also check the same in any of Hadoop 3.0 alpha releases as well. >>>> >>>> Thanks >>>> Sunil >>>> >>>> On Tue, Jun 13, 2017 at 11:29 PM Shmuel Blitz < >>>> shmuel.bl...@similarweb.com> wrote: >>>> >>>>> Hi, >>>>> >>>>> Thanks for your response. >>>>> >>>>> The /metrics API returns a blank page on our RM. >>>>> >>>>> The /jmx API has some metrics, but these are the same metrics we are >>>>> already loading into data-dog. >>>>> It's not good enough, because it doesn't break down the memory use by >>>>> container. >>>>> >>>>> I need the by-container breakdown because resource allocation is per >>>>> container and I would like to se if my job is really using up all the >>>>> allocated memory. >>>>> >>>>> Shmuel >>>>> >>>>> On Tue, Jun 13, 2017 at 6:05 PM, Sidharth Kumar < >>>>> sidharthkumar2...@gmail.com> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> I guess you can get it from http://<resourcemanager-host>:<rm-port>/jmx >>>>>> or /metrics >>>>>> >>>>>> Regards >>>>>> Sidharth >>>>>> LinkedIn: www.linkedin.com/in/sidharthkumar2792 >>>>>> >>>>>> On 13-Jun-2017 6:26 PM, "Shmuel Blitz" <shmuel.bl...@similarweb.com> >>>>>> wrote: >>>>>> >>>>>>> (This question has also been published on StackOveflow >>>>>>> <https://stackoverflow.com/q/44484940/416300>) >>>>>>> >>>>>>> I am looking for a way to monitor memory usage of YARN containers >>>>>>> over time. >>>>>>> >>>>>>> Specifically - given a YARN application-id, how can you get a graph, >>>>>>> showing the memory usage of each of its containers over time? >>>>>>> >>>>>>> The main goal is to better fit memory allocation requirements for >>>>>>> our YARN applications (Spark / Map-Reduce), to avoid over allocation and >>>>>>> cluster resource waste. A side goal would be the ability to debug memory >>>>>>> issues when developing our jobs and attempting to pick reasonable >>>>>>> resource >>>>>>> allocations. >>>>>>> >>>>>>> We've tried using the Data-Dog integration, But it doesn't break >>>>>>> down the metrics by container. >>>>>>> >>>>>>> Another approach was to parse the hadoop-yarn logs. These logs have >>>>>>> messages like: >>>>>>> >>>>>>> Memory usage of ProcessTree 57251 for container-id >>>>>>> container_e116_1495951495692_35134_01_000001: 1.9 GB of 11 GB >>>>>>> physical memory used; 14.4 GB of 23.1 GB virtual memory used >>>>>>> Parsing the logs correctly can yield data that can be used to plot a >>>>>>> graph of memory usage over time. >>>>>>> >>>>>>> That's exactly what we want, but there are two downsides: >>>>>>> >>>>>>> It involves reading human-readable log lines and parsing them into >>>>>>> numeric data. We'd love to avoid that. >>>>>>> If this data can be consumed otherwise, we're hoping it'll have more >>>>>>> information that we might be interest in in the future. We wouldn't >>>>>>> want to >>>>>>> put the time into parsing the logs just to realize we need something >>>>>>> else. >>>>>>> Is there any other way to extract these metrics, either by plugging >>>>>>> in to an existing producer or by writing a simple listener? >>>>>>> >>>>>>> Perhaps a whole other approach? >>>>>>> >>>>>>> -- >>>>>>> [image: Logo] >>>>>>> <https://www.similarweb.com/?utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature> >>>>>>> Shmuel Blitz >>>>>>> *Big Data Developer* >>>>>>> www.similarweb.com >>>>>>> <http://www.similarweb.com?utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature> >>>>>>> >>>>>>> <https://www.facebook.com/SimilarWeb/?fref=ts&utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature> >>>>>>> Like >>>>>>> Us >>>>>>> <https://www.facebook.com/SimilarWeb/?fref=ts&utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature> >>>>>>> >>>>>>> <https://twitter.com/SimilarWeb?utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature> >>>>>>> Follow >>>>>>> Us >>>>>>> <https://twitter.com/SimilarWeb?utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature> >>>>>>> >>>>>>> <https://www.youtube.com/watch?v=Sb09jaZYY7s&utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature> >>>>>>> Watch >>>>>>> Us >>>>>>> <https://www.youtube.com/watch?v=Sb09jaZYY7s&utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature> >>>>>>> >>>>>>> <https://www.similarweb.com/blog/?utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature> >>>>>>> Read >>>>>>> Us >>>>>>> <https://www.similarweb.com/blog/?utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature> >>>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> [image: Logo] >>>>> <https://www.similarweb.com/?utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature> >>>>> Shmuel Blitz >>>>> *Big Data Developer* >>>>> www.similarweb.com >>>>> <http://www.similarweb.com?utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature> >>>>> >>>>> <https://www.facebook.com/SimilarWeb/?fref=ts&utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature> >>>>> Like >>>>> Us >>>>> <https://www.facebook.com/SimilarWeb/?fref=ts&utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature> >>>>> >>>>> <https://twitter.com/SimilarWeb?utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature> >>>>> Follow >>>>> Us >>>>> <https://twitter.com/SimilarWeb?utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature> >>>>> >>>>> <https://www.youtube.com/watch?v=Sb09jaZYY7s&utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature> >>>>> Watch >>>>> Us >>>>> <https://www.youtube.com/watch?v=Sb09jaZYY7s&utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature> >>>>> >>>>> <https://www.similarweb.com/blog/?utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature> >>>>> Read >>>>> Us >>>>> <https://www.similarweb.com/blog/?utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature> >>>>> >>>> >>> >>> >>> -- >>> [image: Logo] >>> <https://www.similarweb.com/?utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature> >>> Shmuel Blitz >>> *Big Data Developer* >>> www.similarweb.com >>> <http://www.similarweb.com?utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature> >>> >>> <https://www.facebook.com/SimilarWeb/?fref=ts&utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature> >>> Like >>> Us >>> <https://www.facebook.com/SimilarWeb/?fref=ts&utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature> >>> >>> <https://twitter.com/SimilarWeb?utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature> >>> Follow >>> Us >>> <https://twitter.com/SimilarWeb?utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature> >>> >>> <https://www.youtube.com/watch?v=Sb09jaZYY7s&utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature> >>> Watch >>> Us >>> <https://www.youtube.com/watch?v=Sb09jaZYY7s&utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature> >>> >>> <https://www.similarweb.com/blog/?utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature> >>> Read >>> Us >>> <https://www.similarweb.com/blog/?utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature> >>> >>