[ https://issues.apache.org/jira/browse/YARN-8450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16574710#comment-16574710 ]
Bibin A Chundatt commented on YARN-8450: ---------------------------------------- [~sunilg]/[~eepayne] During kill scenarios/preemption cases this issue mainly gets exposed. Thoughts on moving the resource check to {{ResourceHandlerChain}}. Solution could be wait until the resource is released by {{resourceHandlers}} which has strict binding. or Adding {{canAssign}} interface to resource handlers, and Query canAssign till timeout. Thoughts? > Blocking resources such as GPU/FPGA etc tend to release actual device slowly > even after RM identifies it as COMPLETED > --------------------------------------------------------------------------------------------------------------------- > > Key: YARN-8450 > URL: https://issues.apache.org/jira/browse/YARN-8450 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager > Affects Versions: 3.0.2 > Reporter: Sunil Govindan > Assignee: Bilwa S T > Priority: Major > > For resources such as GPU/FPGA or similar resources, sometimes we have seen > that device is not released from a container even after container is in > completed states. > In such cases, we need a common way of handling from NM level. YARN-8423 is > only handling this for GPU. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org