[ 
https://issues.apache.org/jira/browse/YARN-7086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16590556#comment-16590556
 ] 

Jason Lowe commented on YARN-7086:
----------------------------------

bq. I assume you are referring the lock inside LeafQueue#completedContainer().

I was referring to the scheduler back in the 2.7/2.8 code which has changed 
considerably in trunk from that.  Back in 2.7 releasing a container required 
the highly-contended CapacityScheduler lock to be obtained, separately, for 
every container released.  When releasing a lot of containers in a single AM 
heartbeat, this caused a long backup as the highly-contended lock needed to be 
reacquired for every released container.  It would have been far more efficient 
to just grab the lock once and release all the containers with the lock held 
the entire time.

The big CapacityScheduler lock appears to be gone in trunk, so I would expect 
the next level of locking bottleneck to be the LeafQueue lock.

> Release all containers aynchronously
> ------------------------------------
>
>                 Key: YARN-7086
>                 URL: https://issues.apache.org/jira/browse/YARN-7086
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>            Reporter: Arun Suresh
>            Assignee: Manikandan R
>            Priority: Major
>         Attachments: YARN-7086.001.patch
>
>
> We have noticed in production two situations that can cause deadlocks and 
> cause scheduling of new containers to come to a halt, especially with regard 
> to applications that have a lot of live containers:
> # When these applicaitons release these containers in bulk.
> # When these applications terminate abruptly due to some failure, the 
> scheduler releases all its live containers in a loop.
> To handle the issues mentioned above, we have a patch in production to make 
> sure ALL container releases happen asynchronously - and it has served us well.
> Opening this JIRA to gather feedback on if this is a good idea generally (cc 
> [~leftnoteasy], [~jlowe], [~curino], [~kasha], [~subru], [~roniburd])
> BTW, In YARN-6251, we already have an asyncReleaseContainer() in the 
> AbstractYarnScheduler and a corresponding scheduler event, which is currently 
> used specifically for the container-update code paths (where the scheduler 
> realeases temp containers which it creates for the update)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to