Robert Kanter created YARN-8310:
-----------------------------------

             Summary: Handle old NMTokenIdentifier, AMRMTokenIdentifier, and 
ContainerTokenIdentifier formats
                 Key: YARN-8310
                 URL: https://issues.apache.org/jira/browse/YARN-8310
             Project: Hadoop YARN
          Issue Type: Bug
    Affects Versions: 2.6.0
            Reporter: Robert Kanter
            Assignee: Robert Kanter


In some recent upgrade testing, we saw this error causing the NodeManager to 
fail to startup afterwards:
{noformat}
org.apache.hadoop.service.ServiceStateException: 
com.google.protobuf.InvalidProtocolBufferException: Protocol message contained 
an invalid tag (zero).
        at 
org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
        at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:173)
        at 
org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
        at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:441)
        at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
        at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:834)
        at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:895)
Caused by: com.google.protobuf.InvalidProtocolBufferException: Protocol message 
contained an invalid tag (zero).
        at 
com.google.protobuf.InvalidProtocolBufferException.invalidTag(InvalidProtocolBufferException.java:89)
        at 
com.google.protobuf.CodedInputStream.readTag(CodedInputStream.java:108)
        at 
org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$ContainerTokenIdentifierProto.<init>(YarnSecurityTokenProtos.java:1860)
        at 
org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$ContainerTokenIdentifierProto.<init>(YarnSecurityTokenProtos.java:1824)
        at 
org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$ContainerTokenIdentifierProto$1.parsePartialFrom(YarnSecurityTokenProtos.java:2016)
        at 
org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$ContainerTokenIdentifierProto$1.parsePartialFrom(YarnSecurityTokenProtos.java:2011)
        at 
com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:200)
        at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:217)
        at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:223)
        at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49)
        at 
org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$ContainerTokenIdentifierProto.parseFrom(YarnSecurityTokenProtos.java:2686)
        at 
org.apache.hadoop.yarn.security.ContainerTokenIdentifier.readFields(ContainerTokenIdentifier.java:254)
        at 
org.apache.hadoop.security.token.Token.decodeIdentifier(Token.java:177)
        at 
org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerTokenIdentifier(BuilderUtils.java:322)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recoverContainer(ContainerManagerImpl.java:455)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:373)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:316)
        at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
        ... 5 more
{noformat}
The NodeManager fails because it's trying to read a 
{{ContainerTokenIdentifier}} in the "old" format before we changed them to 
protobufs (YARN-668).  This is very similar to YARN-5594 where we ran into a 
similar problem with the ResourceManager and RM Delegation Tokens.

To provide a better experience, we should make the code able to read the old 
format if it's unable to read it using the new format.  We didn't run into any 
errors with the other two types of tokens that YARN-668 incompatibly changed 
(NMTokenIdentifier and AMRMTokenIdentifier), but we may as well fix those while 
we're at it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

Reply via email to