[ 
https://issues.apache.org/jira/browse/YARN-4602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15405770#comment-15405770
 ] 

Arun Suresh commented on YARN-4602:
-----------------------------------

[~djp], wondering if you've taken a look at apache REEF. It uses an event 
driven framework called https://reef.apache.org/wake.html that also supports 
messaging. It uses Netty under the hood.

> Simple and Scalable Message Service for YARN application
> --------------------------------------------------------
>
>                 Key: YARN-4602
>                 URL: https://issues.apache.org/jira/browse/YARN-4602
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: applications, resourcemanager
>            Reporter: Junping Du
>            Assignee: Junping Du
>
> We are proposing to support MR AM restart with work preserving in 
> MAPREDUCE-6608 (https://issues.apache.org/jira/browse/MAPREDUCE-6608) that 
> when AM get failed for some reason, the inflight tasks will keep 
> running/pending until new AM attempt comes back to continue. One of 
> prerequisite is tasks should know where the new AM attempt get launched so 
> TaskUmbilicalProtocol can get retry between clients and new server.
> There could be the same requirement for other applications running on YARN 
> too. Some application decide to handle message delivery itself, e.g. Long 
> running services can leverage Slider agent to notify messages back and forth. 
> However, vanilla applications on YARN is hard to achieve this because Hadoop 
> RPC mechanism essentially is a single way of communication. Although two 
> directions mechanism like heartbeats (between NM-RM or AM-RM) can get built 
> on top of it, it make less sense to build the same mechanism between AM and 
> its application containers - or it need to handle massive of client 
> connections in AM which could be the new bottleneck for scalability and very 
> complicated in state maintaining. Instead, we need a new message mechanism 
> that is simple and scalable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to