[ https://issues.apache.org/jira/browse/YARN-7739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16342990#comment-16342990 ]
Wangda Tan commented on YARN-7739: ---------------------------------- I think previously I made a mistake, existing YARN RM will reject any resource request with ask resource > maximum_allocation_calculated_based_on_registered_nodes. The only thing it doesn't do is handling resource types other than memory/vcores. I just uploaded a patch (ver.1) to handle customized resource types and added tests for both scenarios. Since this logic is inside DefaultAMSProcessor, so no scheduler changes required. [~sunil.gov...@gmail.com], could u help to review the patch? > Revisit scheduler resource normalization behavior for max allocation > -------------------------------------------------------------------- > > Key: YARN-7739 > URL: https://issues.apache.org/jira/browse/YARN-7739 > Project: Hadoop YARN > Issue Type: Bug > Reporter: Wangda Tan > Assignee: Wangda Tan > Priority: Blocker > Attachments: YARN-7739.001.patch > > > Currently, YARN Scheduler normalizes requested resource based on the maximum > allocation derived from configured maximum allocation and maximum registered > node resources. Basically, the scheduler will silently cap asked resource by > maximum allocation. > This could cause issues for applications, for example, a Spark job which > needs 12 GB memory to run, however in the cluster, registered NMs have at > most 8 GB mem on each node. So scheduler allocates 8GB memory container to > the requested application. > Once app receives containers from RM, if it doesn't double check allocated > resources, it will lead to OOM and hard to debug because scheduler silently > caps maximum allocation. > When non-mandatory resources introduced, this becomes worse. For resources > like GPU, we typically set minimum allocation to 0 since not all nodes have > GPU devices. So it is possible that application asks 4 GPUs but get 0 GPU, it > gonna be a big problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org