[ 
https://issues.apache.org/jira/browse/YARN-7086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16623100#comment-16623100
 ] 

Manikandan R commented on YARN-7086:
------------------------------------

[~jlowe] I did simple performance test to understand the containers release 
behaviour. Was trying to release 10K containers in single AM allocate call and 
measured the time taken (in secs) for all containers release with below three 
different flows:

1. Exisitng code: No changes.

2. With Patch (Async release + multiple container list traversal): Used 
.002.patch as is with batch size as 1K.

3. With Patch (Not Async release + multiple container list traversal): Slightly 
modified .002.patch to call new completeContainers(Map<RMContainer, 
ContainerStatus> containersToBeReleased, RMContainerEventType event) directly 
rather than going through events flow.

 
||Run||Existing code||With Patch
(Async release + multiple container list traversal)||With Patch
(Not Async release + multiple container list traversal) ||
|1|6.8| 4.6|8.6|
|2|8.3| 7.5| 9.9|
|3|6.8| 7.2| 8.2|
|4|7.2| 7.1| 8.9|
|5| 7.2| 4.6| 10|
|Average of 5 runs|7.26|6.2|9.12|

 

Attaching patch containing only test case to explain the above flow. Can you 
please validate the approach?

> Release all containers aynchronously
> ------------------------------------
>
>                 Key: YARN-7086
>                 URL: https://issues.apache.org/jira/browse/YARN-7086
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>            Reporter: Arun Suresh
>            Assignee: Manikandan R
>            Priority: Major
>         Attachments: YARN-7086.001.patch, YARN-7086.002.patch, 
> YARN-7086.Perf-test-case.patch
>
>
> We have noticed in production two situations that can cause deadlocks and 
> cause scheduling of new containers to come to a halt, especially with regard 
> to applications that have a lot of live containers:
> # When these applicaitons release these containers in bulk.
> # When these applications terminate abruptly due to some failure, the 
> scheduler releases all its live containers in a loop.
> To handle the issues mentioned above, we have a patch in production to make 
> sure ALL container releases happen asynchronously - and it has served us well.
> Opening this JIRA to gather feedback on if this is a good idea generally (cc 
> [~leftnoteasy], [~jlowe], [~curino], [~kasha], [~subru], [~roniburd])
> BTW, In YARN-6251, we already have an asyncReleaseContainer() in the 
> AbstractYarnScheduler and a corresponding scheduler event, which is currently 
> used specifically for the container-update code paths (where the scheduler 
> realeases temp containers which it creates for the update)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to