[ https://issues.apache.org/jira/browse/YARN-7086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Shilun Fan updated YARN-7086: ----------------------------- Target Version/s: 3.5.0 (was: 3.4.0) > Release all containers aynchronously > ------------------------------------ > > Key: YARN-7086 > URL: https://issues.apache.org/jira/browse/YARN-7086 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager > Reporter: Arun Suresh > Assignee: Manikandan R > Priority: Major > Attachments: YARN-7086.001.patch, YARN-7086.002.patch, > YARN-7086.Perf-test-case.patch > > > We have noticed in production two situations that can cause deadlocks and > cause scheduling of new containers to come to a halt, especially with regard > to applications that have a lot of live containers: > # When these applicaitons release these containers in bulk. > # When these applications terminate abruptly due to some failure, the > scheduler releases all its live containers in a loop. > To handle the issues mentioned above, we have a patch in production to make > sure ALL container releases happen asynchronously - and it has served us well. > Opening this JIRA to gather feedback on if this is a good idea generally (cc > [~leftnoteasy], [~jlowe], [~curino], [~kasha], [~subru], [~roniburd]) > BTW, In YARN-6251, we already have an asyncReleaseContainer() in the > AbstractYarnScheduler and a corresponding scheduler event, which is currently > used specifically for the container-update code paths (where the scheduler > realeases temp containers which it creates for the update) -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org