[ https://issues.apache.org/jira/browse/YARN-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15372905#comment-15372905 ]
Nathan Roberts edited comment on YARN-5356 at 7/12/16 2:07 PM: --------------------------------------------------------------- bq. Nathan Roberts, I understand that your problem is that with the current approach you know that you have 6 cores available to the NM and 4 of them are used. However, the machine is not that utilized (~30%). Correct? In that case, we would only need to report the actual size of the machine at registration time as it would never change. Not sure that ResourceUtilization would be the right place for that as it would be reported in every heartbeat continuously. [~elgoiri], Yep, that's exactly correct. I think reporting the physical capabilities of the machine during registration should be ok. At least with linux it is technically possible for the machine to change (e.g. echo 0 > /sys/devices/system/cpu/cpu3/online, OR memory gets automatically removed because it's getting ECC errors, OR something reserves a bunch of memory for huge pages, OR NIC re-negotiates from 10G to 1G), but I think these might be unusual enough that we could ignore them. I originally suggested tweaking ResourceUtilization due to this small chance of a physical resource changing but am happy to go either way. was (Author: nroberts): bq. Nathan Roberts, I understand that your problem is that with the current approach you know that you have 6 cores available to the NM and 4 of them are used. However, the machine is not that utilized (~30%). Correct? In that case, we would only need to report the actual size of the machine at registration time as it would never change. Not sure that ResourceUtilization would be the right place for that as it would be reported in every heartbeat continuously. [~elgoiri], Yep, that's exactly correct. I think reporting the physical capabilities of the machine during registration should be ok. At least with linux it is technically possible for the machine to change (e.g. echo 0 > /sys/devices/system/cpu/cpu3, OR memory gets automatically removed because it's getting ECC errors, OR something reserves a bunch of memory for huge pages, OR NIC re-negotiates from 10G to 1G), but I think these might be unusual enough that we could ignore them. I originally suggested tweaking ResourceUtilization due to this small chance of a physical resource changing but am happy to go either way. > ResourceUtilization should also include resource availability > ------------------------------------------------------------- > > Key: YARN-5356 > URL: https://issues.apache.org/jira/browse/YARN-5356 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager, resourcemanager > Affects Versions: 3.0.0-alpha1 > Reporter: Nathan Roberts > > Currently ResourceUtilization contains absolute quantities of resource used > (e.g. 4096MB memory used). It would be good if it also included how much of > that resource is actually available on the node so that the RM can use this > data to schedule more effectively (overcommit, etc) > Currently the only available information is the Resource the node registered > with (or later updated using updateNodeResource). However, these aren't > really sufficient to get a good view of how utilized a resource is. For > example, if a node reports 400% CPU utilization, does that mean it's > completely full, or barely utilized? Today there is no reliable way to figure > this out. > [~elgoiri] - Lots of good work is happening in YARN-2965 so curious if you > have thoughts/opinions on this? -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org