[ https://issues.apache.org/jira/browse/YARN-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Oleksii Dymytrov reassigned YARN-5924: -------------------------------------- Assignee: Oleksii Dymytrov > Resource Manager fails to load state with InvalidProtocolBufferException > ------------------------------------------------------------------------ > > Key: YARN-5924 > URL: https://issues.apache.org/jira/browse/YARN-5924 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager > Affects Versions: 3.0.0-alpha1 > Reporter: Oleksii Dymytrov > Assignee: Oleksii Dymytrov > Attachments: YARN_5924_v1_001.patch > > > InvalidProtocolBufferException can be thrown during recovering of the > application's state if application's data has invalid format (or is broken) > under FSRMStateRoot/RMAppRoot/application_1477986176766_0134/ directory in > HDFS: > {noformat} > com.google.protobuf.InvalidProtocolBufferException: Protocol message > end-group tag did not match expected tag. > at > com.google.protobuf.InvalidProtocolBufferException.invalidEndTag(InvalidProtocolBufferException.java:94) > at > com.google.protobuf.CodedInputStream.checkLastTagWas(CodedInputStream.java:124) > at > com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:143) > at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:176) > at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:188) > at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:193) > at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49) > at > org.apache.hadoop.yarn.proto.YarnServerResourceManagerRecoveryProtos$ApplicationStateDataProto.parseFrom(YarnServerResourceManagerRecoveryProtos.java:1028) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore$RMAppStateFileProcessor.processChildNode(FileSystemRMStateStore.java:966) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.processDirectoriesOfFiles(FileSystemRMStateStore.java:317) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.loadRMAppState(FileSystemRMStateStore.java:281) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.loadState(FileSystemRMStateStore.java:232) > {noformat} > The solution can be to catch "InvalidProtocolBufferException", show warning > and remove application's folder that contains invalid data to prevent RM > restart failure. > Additionally, I've added catch for other exceptions that can appear during > recovering of the specific application, to avoid RM failure even if the only > one application's state can't be loaded. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org