[ 
https://issues.apache.org/jira/browse/YARN-802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13684624#comment-13684624
 ] 

Avner BenHanoch commented on YARN-802:
--------------------------------------


Hi Siddharth
In addition to what I wrote, I just noticed that perhaps we have a 
misunderstanding.
See "Implementing a Custom Shuffle" from [Hadoop documentation about Pluggable 
Shuffle|http://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/PluggableShuffleAndPluggableSort.html]
{quote}
A custom shuffle implementation requires a 
org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.AuxiliaryServiceimplementation
 class running in the NodeManagers and a 
org.apache.hadoop.mapred.ShuffleConsumerPlugin implementation class running in 
the Reducer tasks.
The default implementations provided by Hadoop can be used as references:
•       org.apache.hadoop.mapred.ShuffleHandler
•       org.apache.hadoop.mapreduce.task.reduce.Shuffle
{quote}

In this issue we are talking about the *provider side of the Shuffle*, we want 
to have additional ShuffleProvider(s) in addition to the default ShuffleHandler.
All shuffle providers are AuxServices running in the NodeManager.  I see them 
as daemons like ShuffleHandler, they are not applications or jobs. If I 
understand correctly, in order to simultaneously support multiple jobs of 
multiple users that each can contact different Shuffle provider we must have 
all providers in the air in parallel.
With this, the ShuffleConsumer in each reducer will be able to request MOFs 
from its desired provider.  However, the provider must know where in the disks 
are the MOFs of a particular job.  The way to know that (see ShuffleHandler) is 
based on the mapping: jobId -> userId.  This data for this map arrive from the 
APPLICATION_INIT event.  Hence, all AuxServices that serve as ShuffleProviders 
need to get APPLICATION_INIT events.

                
> APPLICATION_INIT is never sent to AuxServices other than the builtin 
> ShuffleHandler
> -----------------------------------------------------------------------------------
>
>                 Key: YARN-802
>                 URL: https://issues.apache.org/jira/browse/YARN-802
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: applications, nodemanager
>    Affects Versions: 2.0.4-alpha
>            Reporter: Avner BenHanoch
>
> APPLICATION_INIT is never sent to AuxServices other than the built-in 
> ShuffleHandler.  This means that 3rd party ShuffleProvider(s) will not be 
> able to function, because APPLICATION_INIT enables the AuxiliaryService to 
> map jobId->userId. This is needed for properly finding the MOFs of a job per 
> reducers' requests.
> NOTE: The built-in ShuffleHandler does get APPLICATION_INIT events due to 
> hard-coded expression in hadoop code. The current TaskAttemptImpl.java code 
> explicitly call: serviceData.put (ShuffleHandler.MAPREDUCE_SHUFFLE_SERVICEID, 
> ...) and ignores any additional AuxiliaryService. As a result, only the 
> built-in ShuffleHandler will get APPLICATION_INIT events.  Any 3rd party 
> AuxillaryService will never get APPLICATION_INIT events.
> I think a solution can be in one of two ways:
> 1. Change TaskAttemptImpl.java to loop on all Auxiliary Services and register 
> each of them, by calling serviceData.put (…) in loop.
> 2. Change AuxServices.java similar to the fix in: MAPREDUCE-2668  
> "APPLICATION_STOP is never sent to AuxServices".  This means that in case the 
> 'handle' method gets APPLICATION_INIT event it will demultiplex it to all Aux 
> Services regardless of the value in event.getServiceID().
> I prefer the 2nd solution.  I am welcoming any ideas.  I can provide the 
> needed patch for any option that people like.
> See about Pluggable Shuffle in this URL: 
> http://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/PluggableShuffleAndPluggableSort.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to