[ https://issues.apache.org/jira/browse/YARN-3997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Karthik Kambatla reassigned YARN-3997: -------------------------------------- Assignee: Karthik Kambatla (was: Arun Suresh) > An Application requesting multiple core containers can't preempt running > application made of single core containers > ------------------------------------------------------------------------------------------------------------------- > > Key: YARN-3997 > URL: https://issues.apache.org/jira/browse/YARN-3997 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler > Affects Versions: 2.7.1 > Environment: Ubuntu 14.04, Hadoop 2.7.1, Physical Machines > Reporter: Dan Shechter > Assignee: Karthik Kambatla > Priority: Critical > > When our cluster is configures with preemption, and is fully loaded with an > application consuming 1-core containers, it will not kill off these > containers when a new application kicks in requesting, for example 4 core > containers. > When the "second" application attempts to us 1-core containers as well, > preemption proceeds as planned and everything works properly. > It is my assumptiom, that the fair-scheduler, while recognizing it needs to > kill off some container to make room for the new application, fails to find a > SINGLE container satisfying the request for a 4-core container (since all > existing containers are 1-core containers), and isn't "smart" enough to > realize it needs to kill off 4 single-core containers (in this case) on a > single node, for the new application to be able to proceed... > The exhibited affect is that the new application is hung indefinitely and > never gets the resources it requires. > This can easily be replicated with any yarn application. > Our "goto" scenario in this case is running pyspark with 1-core executors > (containers) while trying to launch h20.ai framework which INSISTS on having > at least 4 cores per container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)