[
https://issues.apache.org/jira/browse/YARN-8310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16479160#comment-16479160
]
Gergo Repas commented on YARN-8310:
-----------------------------------
[~rkanter] Thanks for working on this patch. Other than having checkstyle and
asflicense warnings, LGTM.
> Handle old NMTokenIdentifier, AMRMTokenIdentifier, and
> ContainerTokenIdentifier formats
> ---------------------------------------------------------------------------------------
>
> Key: YARN-8310
> URL: https://issues.apache.org/jira/browse/YARN-8310
> Project: Hadoop YARN
> Issue Type: Bug
> Affects Versions: 2.6.0
> Reporter: Robert Kanter
> Assignee: Robert Kanter
> Priority: Major
> Attachments: YARN-8310.001.patch, YARN-8310.branch-2.001.patch
>
>
> In some recent upgrade testing, we saw this error causing the NodeManager to
> fail to startup afterwards:
> {noformat}
> org.apache.hadoop.service.ServiceStateException:
> com.google.protobuf.InvalidProtocolBufferException: Protocol message
> contained an invalid tag (zero).
> at
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
> at
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:173)
> at
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
> at
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:441)
> at
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
> at
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:834)
> at
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:895)
> Caused by: com.google.protobuf.InvalidProtocolBufferException: Protocol
> message contained an invalid tag (zero).
> at
> com.google.protobuf.InvalidProtocolBufferException.invalidTag(InvalidProtocolBufferException.java:89)
> at
> com.google.protobuf.CodedInputStream.readTag(CodedInputStream.java:108)
> at
> org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$ContainerTokenIdentifierProto.<init>(YarnSecurityTokenProtos.java:1860)
> at
> org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$ContainerTokenIdentifierProto.<init>(YarnSecurityTokenProtos.java:1824)
> at
> org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$ContainerTokenIdentifierProto$1.parsePartialFrom(YarnSecurityTokenProtos.java:2016)
> at
> org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$ContainerTokenIdentifierProto$1.parsePartialFrom(YarnSecurityTokenProtos.java:2011)
> at
> com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:200)
> at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:217)
> at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:223)
> at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49)
> at
> org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$ContainerTokenIdentifierProto.parseFrom(YarnSecurityTokenProtos.java:2686)
> at
> org.apache.hadoop.yarn.security.ContainerTokenIdentifier.readFields(ContainerTokenIdentifier.java:254)
> at
> org.apache.hadoop.security.token.Token.decodeIdentifier(Token.java:177)
> at
> org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerTokenIdentifier(BuilderUtils.java:322)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recoverContainer(ContainerManagerImpl.java:455)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:373)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:316)
> at
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
> ... 5 more
> {noformat}
> The NodeManager fails because it's trying to read a
> {{ContainerTokenIdentifier}} in the "old" format before we changed them to
> protobufs (YARN-668). This is very similar to YARN-5594 where we ran into a
> similar problem with the ResourceManager and RM Delegation Tokens.
> To provide a better experience, we should make the code able to read the old
> format if it's unable to read it using the new format. We didn't run into
> any errors with the other two types of tokens that YARN-668 incompatibly
> changed (NMTokenIdentifier and AMRMTokenIdentifier), but we may as well fix
> those while we're at it.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]