[ https://issues.apache.org/jira/browse/YARN-9099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16756845#comment-16756845 ]
Hudson commented on YARN-9099: ------------------------------ SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #15859 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/15859/]) YARN-9099. GpuResourceAllocator#getReleasingGpus calculates number of (sunilg: rev 71c49fa60faad2504b0411979a6e46e595b97a85) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/gpu/GpuResourceAllocator.java > GpuResourceAllocator#getReleasingGpus calculates number of GPUs in a wrong way > ------------------------------------------------------------------------------ > > Key: YARN-9099 > URL: https://issues.apache.org/jira/browse/YARN-9099 > Project: Hadoop YARN > Issue Type: Bug > Reporter: Szilard Nemeth > Assignee: Szilard Nemeth > Priority: Major > Fix For: 3.3.0, 3.2.1, 3.1.3 > > Attachments: YARN-9099.001.patch, YARN-9099.002.patch > > > getReleasingGpus plays an important role in the calculation which happens > when GpuAllocator assign GPUs to a container, see: > GpuResourceAllocator#internalAssignGpus. > If multiple GPUs are assigned to the same container, getReleasingGpus will > return an invalid number. > The iterator goes over on mappings of (GPU device, container ID) and it > retrieves the container by its ID the number of times the container ID is > mapped to any device. > Then for every container, the resource value for the GPU resource is added to > a running sum. > Obviously, if a container is mapped to 2 or more devices, then the > container's GPU resource counter is added to the running sum as many times as > the number of GPU devices the container has. > Example: > Let's suppose {{usedDevices}} contains these mappings: > - (GPU1, container1) > - (GPU2, container1) > - (GPU3, container2) > GPU resource value is 2 for container1 and > GPU resource value is 1 for container2. > Then, if container1 is in a running state, getReleasingGpus will return 4 > instead of 2. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org