Oleksii Dymytrov created YARN-5924: -------------------------------------- Summary: Resource Manager fails to load state with InvalidProtocolBufferException Key: YARN-5924 URL: https://issues.apache.org/jira/browse/YARN-5924 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 3.0.0-alpha1 Reporter: Oleksii Dymytrov
InvalidProtocolBufferException can be thrown during recovering of the application's state if application's data will have invalid format (or will be broken) under FSRMStateRoot/RMAppRoot/application_1477986176766_0134/ directory in HDFS: {noformat} com.google.protobuf.InvalidProtocolBufferException: Protocol message end-group tag did not match expected tag. at com.google.protobuf.InvalidProtocolBufferException.invalidEndTag(InvalidProtocolBufferException.java:94) at com.google.protobuf.CodedInputStream.checkLastTagWas(CodedInputStream.java:124) at com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:143) at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:176) at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:188) at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:193) at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49) at org.apache.hadoop.yarn.proto.YarnServerResourceManagerRecoveryProtos$ApplicationStateDataProto.parseFrom(YarnServerResourceManagerRecoveryProtos.java:1028) at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore$RMAppStateFileProcessor.processChildNode(FileSystemRMStateStore.java:966) at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.processDirectoriesOfFiles(FileSystemRMStateStore.java:317) at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.loadRMAppState(FileSystemRMStateStore.java:281) at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.loadState(FileSystemRMStateStore.java:232) {noformat} The solution can be to catch "InvalidProtocolBufferException", show warning and remove application's folder that contains invalid data to prevent RM restart failure. Additionally, I've added catch for other exceptions that can appear during recovering of the specific application, to avoid RM failure even if the only one application's state can't be loaded. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org