[ 
https://issues.apache.org/jira/browse/YARN-1773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated YARN-1773:
-----------------------------

    Description: 
Currently, the ShuffleHeader (which is a Writable) simply tries to read the 
successful header (mapid, reduceid etc). If there is an error then the input 
will have an error message instead of (mapid, reducedid etc). Thus parsing the 
ShuffleHeader fails and since we dont know where the error message ends, we 
cannot consume the remaining input stream which may have good data from the 
remaining map outputs. Being able to encode the error in the ShuffleHeader will 
let us parse out the error correctly and move on to the remaining data.
The shuffle handler response should say which maps are in error and which are 
fine, what the error was for the erroneous maps. These will help report 
diagnostics for easier upstream reporting.

  was:Currently, the ShuffleHeader (which is a Writable) simply tries to read 
the successful header (mapid, reduceid etc). If there is an error then the 
input will have an error message instead of (mapid, reducedid etc). Thus 
parsing the ShuffleHeader fails and since we dont know where the error message 
ends, we cannot consume the remaining input stream which may have good data 
from the remaining map outputs. Being able to encode the error in the 
ShuffleHeader will let us parse out the error correctly and move on to the 
remaining data.


> ShuffleHeader should have a format that can inform about errors
> ---------------------------------------------------------------
>
>                 Key: YARN-1773
>                 URL: https://issues.apache.org/jira/browse/YARN-1773
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 2.3.0
>            Reporter: Bikas Saha
>            Priority: Critical
>
> Currently, the ShuffleHeader (which is a Writable) simply tries to read the 
> successful header (mapid, reduceid etc). If there is an error then the input 
> will have an error message instead of (mapid, reducedid etc). Thus parsing 
> the ShuffleHeader fails and since we dont know where the error message ends, 
> we cannot consume the remaining input stream which may have good data from 
> the remaining map outputs. Being able to encode the error in the 
> ShuffleHeader will let us parse out the error correctly and move on to the 
> remaining data.
> The shuffle handler response should say which maps are in error and which are 
> fine, what the error was for the erroneous maps. These will help report 
> diagnostics for easier upstream reporting.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to