[ https://issues.apache.org/jira/browse/YARN-7086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16645159#comment-16645159 ]
Jason Lowe commented on YARN-7086: ---------------------------------- Thanks for developing a perf test case! The huge variations in runtime need to be investigated. The second test case variations are up to 63%, including multiple samples that are slower than existing code average. With this data, I would argue the results are close to the noise range given the wild swings in measurements. How could it sometimes be well over 50% faster sometimes? Is the JVM hitting a large GC? System I/O? I see the test is spamming logs on stdout in a tight loop while measuring timing -- that's not good. I could see I/O effects dominating the runtimes. Try running this where the test produces as little output as possible while running. No stdout printing in the tight loop, use a log4j.properties that suppresses the RM logging, etc. We need to get the runs to be a lot more consistent, otherwise we're probably not measuring what we think we're measuring. > Release all containers aynchronously > ------------------------------------ > > Key: YARN-7086 > URL: https://issues.apache.org/jira/browse/YARN-7086 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager > Reporter: Arun Suresh > Assignee: Manikandan R > Priority: Major > Attachments: YARN-7086.001.patch, YARN-7086.002.patch, > YARN-7086.Perf-test-case.patch > > > We have noticed in production two situations that can cause deadlocks and > cause scheduling of new containers to come to a halt, especially with regard > to applications that have a lot of live containers: > # When these applicaitons release these containers in bulk. > # When these applications terminate abruptly due to some failure, the > scheduler releases all its live containers in a loop. > To handle the issues mentioned above, we have a patch in production to make > sure ALL container releases happen asynchronously - and it has served us well. > Opening this JIRA to gather feedback on if this is a good idea generally (cc > [~leftnoteasy], [~jlowe], [~curino], [~kasha], [~subru], [~roniburd]) > BTW, In YARN-6251, we already have an asyncReleaseContainer() in the > AbstractYarnScheduler and a corresponding scheduler event, which is currently > used specifically for the container-update code paths (where the scheduler > realeases temp containers which it creates for the update) -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org