[ https://issues.apache.org/jira/browse/YARN-1773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Bikas Saha updated YARN-1773: ----------------------------- Description: Currently, the ShuffleHeader (which is a Writable) simply tries to read the successful header (mapid, reduceid etc). If there is an error then the input will have an error message instead of (mapid, reducedid etc). Thus parsing the ShuffleHeader fails and since we dont know where the error message ends, we cannot consume the remaining input stream which may have good data from the remaining map outputs. Being able to encode the error in the ShuffleHeader will let us parse out the error correctly and move on to the remaining data. The shuffle handler response should say which maps are in error and which are fine, what the error was for the erroneous maps. These will help report diagnostics for easier upstream reporting. was:Currently, the ShuffleHeader (which is a Writable) simply tries to read the successful header (mapid, reduceid etc). If there is an error then the input will have an error message instead of (mapid, reducedid etc). Thus parsing the ShuffleHeader fails and since we dont know where the error message ends, we cannot consume the remaining input stream which may have good data from the remaining map outputs. Being able to encode the error in the ShuffleHeader will let us parse out the error correctly and move on to the remaining data. > ShuffleHeader should have a format that can inform about errors > --------------------------------------------------------------- > > Key: YARN-1773 > URL: https://issues.apache.org/jira/browse/YARN-1773 > Project: Hadoop YARN > Issue Type: Bug > Affects Versions: 2.3.0 > Reporter: Bikas Saha > Priority: Critical > > Currently, the ShuffleHeader (which is a Writable) simply tries to read the > successful header (mapid, reduceid etc). If there is an error then the input > will have an error message instead of (mapid, reducedid etc). Thus parsing > the ShuffleHeader fails and since we dont know where the error message ends, > we cannot consume the remaining input stream which may have good data from > the remaining map outputs. Being able to encode the error in the > ShuffleHeader will let us parse out the error correctly and move on to the > remaining data. > The shuffle handler response should say which maps are in error and which are > fine, what the error was for the erroneous maps. These will help report > diagnostics for easier upstream reporting. -- This message was sent by Atlassian JIRA (v6.2#6252)