[ 
https://issues.apache.org/jira/browse/YARN-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16010362#comment-16010362
 ] 

ASF GitHub Bot commented on YARN-5924:
--------------------------------------

Github user ameks94 commented on the issue:

    https://github.com/apache/hadoop/pull/164
  
    I realized that current solution is not good (to allow RM's launch even 
with broken app's data).
    It's better to crash RM in case application's file with app's state is 
broken. This case we can specify more detailed information about which file is 
broken (Maybe to give the recommendation to remove application's folder with 
broken data to allow RM to be launched successfully)
    Second, the most important part of the fix should be to find the reason of 
file's crashing and to find the way to prevent file's crash.


> Resource Manager fails to load state with InvalidProtocolBufferException
> ------------------------------------------------------------------------
>
>                 Key: YARN-5924
>                 URL: https://issues.apache.org/jira/browse/YARN-5924
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 3.0.0-alpha1
>            Reporter: Oleksii Dymytrov
>            Assignee: Oleksii Dymytrov
>         Attachments: YARN-5924.002.patch
>
>
> InvalidProtocolBufferException is thrown during recovering of the 
> application's state if application's data has invalid format (or is broken) 
> under FSRMStateRoot/RMAppRoot/application_1477986176766_0134/ directory in 
> HDFS:
> {noformat}
> com.google.protobuf.InvalidProtocolBufferException: Protocol message 
> end-group tag did not match expected tag.
>       at 
> com.google.protobuf.InvalidProtocolBufferException.invalidEndTag(InvalidProtocolBufferException.java:94)
>       at 
> com.google.protobuf.CodedInputStream.checkLastTagWas(CodedInputStream.java:124)
>       at 
> com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:143)
>       at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:176)
>       at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:188)
>       at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:193)
>       at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49)
>       at 
> org.apache.hadoop.yarn.proto.YarnServerResourceManagerRecoveryProtos$ApplicationStateDataProto.parseFrom(YarnServerResourceManagerRecoveryProtos.java:1028)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore$RMAppStateFileProcessor.processChildNode(FileSystemRMStateStore.java:966)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.processDirectoriesOfFiles(FileSystemRMStateStore.java:317)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.loadRMAppState(FileSystemRMStateStore.java:281)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.loadState(FileSystemRMStateStore.java:232)
> {noformat}
> The solution can be to catch "InvalidProtocolBufferException", show warning 
> and remove application's folder that contains invalid data to prevent RM 
> restart failure. 
> Additionally, I've added catch for other exceptions that can appear during 
> recovering of the specific application, to avoid RM failure even if the only 
> one application's state can't be loaded.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to