[jira] [Commented] (YARN-7086) Release all containers aynchronously

Jason Lowe (JIRA) Wed, 10 Oct 2018 08:36:24 -0700


    [ 
https://issues.apache.org/jira/browse/YARN-7086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16645159#comment-16645159
 ]


Jason Lowe commented on YARN-7086:
----------------------------------

Thanks for developing a perf test case!  The huge variations in runtime need to 
be investigated.  The second test case variations are up to 63%, including 
multiple samples that are slower than existing code average.  With this data, I 
would argue the results are close to the noise range given the wild swings in 
measurements.  How could it sometimes be well over 50% faster sometimes?  Is 
the JVM hitting a large GC?  System I/O?  I see the test is spamming logs on 
stdout in a tight loop while measuring timing -- that's not good.  I could see 
I/O effects dominating the runtimes.  Try running this where the test produces 
as little output as possible while running.  No stdout printing in the tight 
loop, use a log4j.properties that suppresses the RM logging, etc.  We need to 
get the runs to be a lot more consistent, otherwise we're probably not 
measuring what we think we're measuring.


> Release all containers aynchronously
> ------------------------------------
>
>                 Key: YARN-7086
>                 URL: https://issues.apache.org/jira/browse/YARN-7086
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>            Reporter: Arun Suresh
>            Assignee: Manikandan R
>            Priority: Major
>         Attachments: YARN-7086.001.patch, YARN-7086.002.patch, 
> YARN-7086.Perf-test-case.patch
>
>
> We have noticed in production two situations that can cause deadlocks and 
> cause scheduling of new containers to come to a halt, especially with regard 
> to applications that have a lot of live containers:
> # When these applicaitons release these containers in bulk.
> # When these applications terminate abruptly due to some failure, the 
> scheduler releases all its live containers in a loop.
> To handle the issues mentioned above, we have a patch in production to make 
> sure ALL container releases happen asynchronously - and it has served us well.
> Opening this JIRA to gather feedback on if this is a good idea generally (cc 
> [~leftnoteasy], [~jlowe], [~curino], [~kasha], [~subru], [~roniburd])
> BTW, In YARN-6251, we already have an asyncReleaseContainer() in the 
> AbstractYarnScheduler and a corresponding scheduler event, which is currently 
> used specifically for the container-update code paths (where the scheduler 
> realeases temp containers which it creates for the update)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7086) Release all containers aynchronously

Reply via email to