[ 
https://issues.apache.org/jira/browse/YARN-574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15819248#comment-15819248
 ] 

Jason Lowe commented on YARN-574:
---------------------------------

Thanks for picking this up [~ajithshetty].  I took a quick look at the patch.  
It looks OK at a high level, but there is a race condition in how we're dealing 
with the thread pool.  The code makes the assumption that work submitted to the 
queue will be picked up instantly by an idle thread in the thread pool.  If 
it's not picked up fast enough then we can end up doing one or more super-quick 
heartbeats and accidentally queue up more work for the thread pool than we have 
active threads.  That could actually make the localization _slower_ when there 
are multiple containers for the same job on the same node, since one of the 
other container localizers that has idle threads cannot work on a resource 
already handed to another localizer.

IMHO we can trivially track the outstanding count ourselves.  We simply need to 
increment an AtomicInteger when we submit the work to the executor, then wrap 
FSDownload in another Callable that decrements the AtomicInteger when 
FSDownload returns/throws.  Then we can track how many resources are either 
pending or actively being downloaded without getting bitten by race conditions 
in the executor implementation.  Alternatively the createStatus method already 
walks the Future objects returned from the executor and we could calculate how 
many resources are in-progress (i.e.: either pending or actively being 
downloaded) there.  Once there are as many in-progress resources as the 
configured parallelism then we should avoid making quick heartbeats.


> PrivateLocalizer does not support parallel resource download via 
> ContainerLocalizer
> -----------------------------------------------------------------------------------
>
>                 Key: YARN-574
>                 URL: https://issues.apache.org/jira/browse/YARN-574
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>    Affects Versions: 2.6.0, 2.8.0, 2.7.1
>            Reporter: Omkar Vinit Joshi
>            Assignee: Ajith S
>         Attachments: YARN-574.03.patch, YARN-574.04.patch, YARN-574.1.patch, 
> YARN-574.2.patch
>
>
> At present private resources will be downloaded in parallel only if multiple 
> containers request the same resource. However otherwise it will be serial. 
> The protocol between PrivateLocalizer and ContainerLocalizer supports 
> multiple downloads however it is not used and only one resource is sent for 
> downloading at a time.
> I think we can increase / assure parallelism (even for single container 
> requesting resource) for private/application resources by making multiple 
> downloads per ContainerLocalizer.
> Total Parallelism before
> = number of threads allotted for PublicLocalizer [public resource] + number 
> of containers[private and application resource]
> Total Parallelism after
> = number of threads allotted for PublicLocalizer [public resource] + number 
> of containers * max downloads per container [private and application resource]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to