[ 
https://issues.apache.org/jira/browse/YARN-5621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15524603#comment-15524603
 ] 

Chris Douglas commented on YARN-5621:
-------------------------------------

That summary of work seems about right, thanks for putting it together.

You raise excellent points about error handling. Your sketch includes a channel 
communicating which resources were (un)successfully linked. The script-driven 
approach handles this in v05 by writing a separate bash script and invoking the 
CE for each symlink (which, to be fair, isn't exactly "lightweight" when 
compared to extending {{ContainerLocalizer}}). In v05, a failure affects only 
one resource, but to take your earlier example linking a batch of resources in 
the script: how would one handle partial failures? What's the state of the 
container and resources when the script invocation fails?

On the CL proposal: either the CI initiates the symlink request to the 
{{ResourceLocalizationService}} after download, or the two operations are 
contained within that service. The complexity is comparable. The 2-phase 
protocol you sketch (CI initiates download, then link) adds a gap when the CL 
could be shut down before it receives the {{LINK}} commands (causing two CL 
launches), but even a short timeout would likely cover that.

A single-message annotating the resource (download+symlink) could add states to 
{{LocalizedResource}} if it were to notify starting containers directly 
(current code) or handoff to the RLS for symlink. In this case, the protocol to 
the {{ContainerImpl}} is simpler (resending/retry is idempotent b/c it doesn't 
care if the download or symlink failed). Both {{FetchSuccessTransition}} and 
{{LocalizedResourceTransition}} would need to send 
{{LocalizerResourceRequestEvent}} for running containers to symlink. A failed 
symlink would look like a failed download to the CI. Start container is 
unaffected.

For the CL itself... sure, {{ResourceLocalizationSpec}} needs an another field 
for symlinks. This side is pretty straightforward, right?

> Support LinuxContainerExecutor to create symlinks for continuously localized 
> resources
> --------------------------------------------------------------------------------------
>
>                 Key: YARN-5621
>                 URL: https://issues.apache.org/jira/browse/YARN-5621
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Jian He
>            Assignee: Jian He
>         Attachments: YARN-5621.1.patch, YARN-5621.2.patch, YARN-5621.3.patch, 
> YARN-5621.4.patch, YARN-5621.5.patch
>
>
> When new resources are localized, new symlink needs to be created for the 
> localized resource. This is the change for the LinuxContainerExecutor to 
> create the symlinks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to