Oleksii Dymytrov created YARN-5924:
--------------------------------------

             Summary: Resource Manager fails to load state with 
InvalidProtocolBufferException
                 Key: YARN-5924
                 URL: https://issues.apache.org/jira/browse/YARN-5924
             Project: Hadoop YARN
          Issue Type: Bug
          Components: resourcemanager
    Affects Versions: 3.0.0-alpha1
            Reporter: Oleksii Dymytrov


InvalidProtocolBufferException can be thrown during recovering of the 
application's state if application's data will have invalid format (or will be 
broken) under FSRMStateRoot/RMAppRoot/application_1477986176766_0134/ directory 
in HDFS:
{noformat}
com.google.protobuf.InvalidProtocolBufferException: Protocol message end-group 
tag did not match expected tag.

        at 
com.google.protobuf.InvalidProtocolBufferException.invalidEndTag(InvalidProtocolBufferException.java:94)
        at 
com.google.protobuf.CodedInputStream.checkLastTagWas(CodedInputStream.java:124)
        at 
com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:143)
        at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:176)
        at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:188)
        at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:193)
        at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49)
        at 
org.apache.hadoop.yarn.proto.YarnServerResourceManagerRecoveryProtos$ApplicationStateDataProto.parseFrom(YarnServerResourceManagerRecoveryProtos.java:1028)
        at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore$RMAppStateFileProcessor.processChildNode(FileSystemRMStateStore.java:966)
        at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.processDirectoriesOfFiles(FileSystemRMStateStore.java:317)
        at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.loadRMAppState(FileSystemRMStateStore.java:281)
        at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.loadState(FileSystemRMStateStore.java:232)
{noformat}

The solution can be to catch "InvalidProtocolBufferException", show warning and 
remove application's folder that contains invalid data to prevent RM restart 
failure. 
Additionally, I've added catch for other exceptions that can appear during 
recovering of the specific application, to avoid RM failure even if the only 
one application's state can't be loaded.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

Reply via email to