[ 
https://issues.apache.org/jira/browse/YARN-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Oleksii Dymytrov updated YARN-5924:
-----------------------------------
    Description: 
InvalidProtocolBufferException is thrown during recovering of the application's 
state if application's data has invalid format (or is broken) under 
FSRMStateRoot/RMAppRoot/application_1477986176766_0134/ directory in HDFS:
{noformat}
com.google.protobuf.InvalidProtocolBufferException: Protocol message end-group 
tag did not match expected tag.

        at 
com.google.protobuf.InvalidProtocolBufferException.invalidEndTag(InvalidProtocolBufferException.java:94)
        at 
com.google.protobuf.CodedInputStream.checkLastTagWas(CodedInputStream.java:124)
        at 
com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:143)
        at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:176)
        at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:188)
        at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:193)
        at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49)
        at 
org.apache.hadoop.yarn.proto.YarnServerResourceManagerRecoveryProtos$ApplicationStateDataProto.parseFrom(YarnServerResourceManagerRecoveryProtos.java:1028)
        at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore$RMAppStateFileProcessor.processChildNode(FileSystemRMStateStore.java:966)
        at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.processDirectoriesOfFiles(FileSystemRMStateStore.java:317)
        at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.loadRMAppState(FileSystemRMStateStore.java:281)
        at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.loadState(FileSystemRMStateStore.java:232)
{noformat}

The solution can be to catch "InvalidProtocolBufferException", show warning and 
remove application's folder that contains invalid data to prevent RM restart 
failure. 
Additionally, I've added catch for other exceptions that can appear during 
recovering of the specific application, to avoid RM failure even if the only 
one application's state can't be loaded.



  was:
InvalidProtocolBufferException can be thrown during recovering of the 
application's state if application's data has invalid format (or is broken) 
under FSRMStateRoot/RMAppRoot/application_1477986176766_0134/ directory in HDFS:
{noformat}
com.google.protobuf.InvalidProtocolBufferException: Protocol message end-group 
tag did not match expected tag.

        at 
com.google.protobuf.InvalidProtocolBufferException.invalidEndTag(InvalidProtocolBufferException.java:94)
        at 
com.google.protobuf.CodedInputStream.checkLastTagWas(CodedInputStream.java:124)
        at 
com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:143)
        at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:176)
        at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:188)
        at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:193)
        at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49)
        at 
org.apache.hadoop.yarn.proto.YarnServerResourceManagerRecoveryProtos$ApplicationStateDataProto.parseFrom(YarnServerResourceManagerRecoveryProtos.java:1028)
        at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore$RMAppStateFileProcessor.processChildNode(FileSystemRMStateStore.java:966)
        at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.processDirectoriesOfFiles(FileSystemRMStateStore.java:317)
        at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.loadRMAppState(FileSystemRMStateStore.java:281)
        at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.loadState(FileSystemRMStateStore.java:232)
{noformat}

The solution can be to catch "InvalidProtocolBufferException", show warning and 
remove application's folder that contains invalid data to prevent RM restart 
failure. 
Additionally, I've added catch for other exceptions that can appear during 
recovering of the specific application, to avoid RM failure even if the only 
one application's state can't be loaded.




> Resource Manager fails to load state with InvalidProtocolBufferException
> ------------------------------------------------------------------------
>
>                 Key: YARN-5924
>                 URL: https://issues.apache.org/jira/browse/YARN-5924
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 3.0.0-alpha1
>            Reporter: Oleksii Dymytrov
>            Assignee: Oleksii Dymytrov
>         Attachments: YARN_5924_v1_001.patch
>
>
> InvalidProtocolBufferException is thrown during recovering of the 
> application's state if application's data has invalid format (or is broken) 
> under FSRMStateRoot/RMAppRoot/application_1477986176766_0134/ directory in 
> HDFS:
> {noformat}
> com.google.protobuf.InvalidProtocolBufferException: Protocol message 
> end-group tag did not match expected tag.
>       at 
> com.google.protobuf.InvalidProtocolBufferException.invalidEndTag(InvalidProtocolBufferException.java:94)
>       at 
> com.google.protobuf.CodedInputStream.checkLastTagWas(CodedInputStream.java:124)
>       at 
> com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:143)
>       at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:176)
>       at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:188)
>       at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:193)
>       at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49)
>       at 
> org.apache.hadoop.yarn.proto.YarnServerResourceManagerRecoveryProtos$ApplicationStateDataProto.parseFrom(YarnServerResourceManagerRecoveryProtos.java:1028)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore$RMAppStateFileProcessor.processChildNode(FileSystemRMStateStore.java:966)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.processDirectoriesOfFiles(FileSystemRMStateStore.java:317)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.loadRMAppState(FileSystemRMStateStore.java:281)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.loadState(FileSystemRMStateStore.java:232)
> {noformat}
> The solution can be to catch "InvalidProtocolBufferException", show warning 
> and remove application's folder that contains invalid data to prevent RM 
> restart failure. 
> Additionally, I've added catch for other exceptions that can appear during 
> recovering of the specific application, to avoid RM failure even if the only 
> one application's state can't be loaded.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to