[ https://issues.apache.org/jira/browse/YARN-6798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084766#comment-16084766 ]
Jason Lowe commented on YARN-6798: ---------------------------------- IMHO we should only need to bump the major version if any of the following are true: * Older NM software will explode when it tries to recover the state store * Older NM software fails to do something crucial during recovery due to ignoring something in the state store otherwise we can keep the major version the same and simply bump the minor version. It looks like the two features added to the state store in a way where we can remain on 1.x, but I haven't dug into it deeply to be sure. bq. This will be incompatible the previous alphas and anyone running directly from branch-2 builds. True, but that's the risk of running on unreleased software (as is the case with branch-2). Anyone could check in something that isn't backwards-compatible that needs to be subsequently fixed, and that could break users who happened to deploy in-between. AFAIK we don't make any commitments to compatibility except for official Apache Hadoop releases. I would argue the same applies to alpha releases. The whole point of calling it alpha is to convey that APIs may be unstable and could disappear or change in an incompatible way in the next release. It will be annoying to users who expect to do a rolling upgrade from 3.0-alphaX, but given the "alpha" tag I would not expect anyone to have deployed this in a production environment such that they cannot live with a downtime when upgrading to a subsequent release. It would be helpful to have a release note that calls out the incompatibility with 3.0-alpha releases and that users who are upgrading from one of those releases will need to erase the NM state store on each node before upgrading. > NM startup failure with old state store due to version mismatch > --------------------------------------------------------------- > > Key: YARN-6798 > URL: https://issues.apache.org/jira/browse/YARN-6798 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager > Affects Versions: 3.0.0-alpha4 > Reporter: Ray Chiang > Assignee: Ray Chiang > Attachments: YARN-6798.v1.patch > > > YARN-6703 rolled back the state store version number for the RM from 2.0 to > 1.4. > YARN-6127 bumped the version for the NM to 3.0 > private static final Version CURRENT_VERSION_INFO = > Version.newInstance(3, 0); > YARN-5049 bumped the version for the NM to 2.0 > private static final Version CURRENT_VERSION_INFO = > Version.newInstance(2, 0); > During an upgrade, all NMs died after upgrading a C6 cluster from alpha2 to > alpha4. > {noformat} > 2017-07-07 15:48:17,259 FATAL > org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting > NodeManager > org.apache.hadoop.service.ServiceStateException: java.io.IOException: > Incompatible version for NM state: expecting NM state version 3.0, but > loading version 2.0 > at > org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:172) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartRecoveryStore(NodeManager.java:246) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:307) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:748) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:809) > Caused by: java.io.IOException: Incompatible version for NM state: expecting > NM state version 3.0, but loading version 2.0 > at > org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.checkVersion(NMLeveldbStateStoreService.java:1454) > at > org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.initStorage(NMLeveldbStateStoreService.java:1308) > at > org.apache.hadoop.yarn.server.nodemanager.recovery.NMStateStoreService.serviceInit(NMStateStoreService.java:307) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > ... 5 more > 2017-07-07 15:48:17,277 INFO > org.apache.hadoop.yarn.server.nodemanager.NodeManager: SHUTDOWN_MSG: > /************************************************************ > SHUTDOWN_MSG: Shutting down NodeManager at xxx.gce.cloudera.com/aa.bb.cc.dd > ************************************************************/ > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org