Ferenc Erdelyi created YARN-11774:
-------------------------------------
Summary: DominantResourceCalculator - Used Resources Percentage
Metrics is Incorrect
Key: YARN-11774
URL: https://issues.apache.org/jira/browse/YARN-11774
Project: Hadoop YARN
Issue Type: Improvement
Components: yarn-service
Reporter: Ferenc Erdelyi
Attachments: DominantResourceCalculator_repro1_metrics.png
The issue occurs using Dominant Resource Calculator
Reproduction steps:
- create two queues: root.a and root.b. Submit a vcores-heavy application to
queue.a and memory-heavy application to queue.b.
Make sure the applications started running, then navigate to YARN UI1 and check
the Used Resources percentage. We expect to get the percentage based on the
given resource-heavy values. E.g. if it was vcores, then we get the ratio of
the used vcores value and the effective vcores value and multiply it by 100 to
get the %.
In some cases the calculation is incorrect. See the screenshot.
!DominantResourceCalculator_repro1_metrics.png!
root.a queue
{code:java}
hadoop jar
/opt/cloudera/parcels/CDH/lib/hadoop-yarn/hadoop-yarn-applications-distributedshell.jar
-shell_command 'while true; echo Timestamp: \"\"\$(date +%Y-%m-%d\
%H:%M:%S)\"\"; do sleep 3600; done' -jar
/opt/cloudera/parcels/CDH/lib/hadoop-yarn/hadoop-yarn-applications-distributedshell.jar
--num_containers 2 --master_memory 1024 –master_vcores 5 --container_memory
1024 --container_vcores 5 -queue root.a
{code}
root.b queue
{code:java}
hadoop jar
/opt/cloudera/parcels/CDH/lib/hadoop-yarn/hadoop-yarn-applications-distributedshell.jar
-shell_command 'while true; echo Timestamp: \"\"\$(date +%Y-%m-%d\
%H:%M:%S)\"\"; do sleep 3600; done' -jar
/opt/cloudera/parcels/CDH/lib/hadoop-yarn/hadoop-yarn-applications-distributedshell.jar
--num_containers 2 --master_memory 5120 –master_vcores 1 --container_memory
5120 --container_vcores 1 -queue root.b
Observation:
VCores "Used Capacity" percentage calculation is not intuitive.
Out of 5 vcores queue capacity, we used 11 vcores (over the queue capacity).
Intuitively - based on my understanding - we expect to calculate the percentage
as 11/5*100=220, but we get a different value - 206.3
{code}
For the memory, the "Used Capacity" calculation, I was not able to confirm the
issue, however it seems to occur from time-to-time
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]