[ 
https://issues.apache.org/jira/browse/YARN-6798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084766#comment-16084766
 ] 

Jason Lowe commented on YARN-6798:
----------------------------------

IMHO we should only need to bump the major version if any of the following are 
true:
* Older NM software will explode when it tries to recover the state store
* Older NM software fails to do something crucial during recovery due to 
ignoring something in the state store

otherwise we can keep the major version the same and simply bump the minor 
version.  It looks like the two features added to the state store in a way 
where we can remain on 1.x, but I haven't dug into it deeply to be sure.  

bq. This will be incompatible the previous alphas and anyone running directly 
from branch-2 builds.
True, but that's the risk of running on unreleased software (as is the case 
with branch-2).  Anyone could check in something that isn't 
backwards-compatible that needs to be subsequently fixed, and that could break 
users who happened to deploy in-between.  AFAIK we don't make any commitments 
to compatibility except for official Apache Hadoop releases.

I would argue the same applies to alpha releases.  The whole point of calling 
it alpha is to convey that APIs may be unstable and could disappear or change 
in an incompatible way in the next release.  It will be annoying to users who 
expect to do a rolling upgrade from 3.0-alphaX, but given the "alpha" tag I 
would not expect anyone to have deployed this in a production environment such 
that they cannot live with a downtime when upgrading to a subsequent release.

It would be helpful to have a release note that calls out the incompatibility 
with 3.0-alpha releases and that users who are upgrading from one of those 
releases will need to erase the NM state store on each node before upgrading.

> NM startup failure with old state store due to version mismatch
> ---------------------------------------------------------------
>
>                 Key: YARN-6798
>                 URL: https://issues.apache.org/jira/browse/YARN-6798
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 3.0.0-alpha4
>            Reporter: Ray Chiang
>            Assignee: Ray Chiang
>         Attachments: YARN-6798.v1.patch
>
>
> YARN-6703 rolled back the state store version number for the RM from 2.0 to 
> 1.4.
> YARN-6127 bumped the version for the NM to 3.0
>     private static final Version CURRENT_VERSION_INFO = 
> Version.newInstance(3, 0);
> YARN-5049 bumped the version for the NM to 2.0
>     private static final Version CURRENT_VERSION_INFO = 
> Version.newInstance(2, 0);
> During an upgrade, all NMs died after upgrading a C6 cluster from alpha2 to 
> alpha4.
> {noformat}
> 2017-07-07 15:48:17,259 FATAL 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting 
> NodeManager
> org.apache.hadoop.service.ServiceStateException: java.io.IOException: 
> Incompatible version for NM state: expecting NM state version 3.0, but 
> loading version 2.0
>         at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
>         at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartRecoveryStore(NodeManager.java:246)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:307)
>         at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:748)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:809)
> Caused by: java.io.IOException: Incompatible version for NM state: expecting 
> NM state version 3.0, but loading version 2.0
>         at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.checkVersion(NMLeveldbStateStoreService.java:1454)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.initStorage(NMLeveldbStateStoreService.java:1308)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMStateStoreService.serviceInit(NMStateStoreService.java:307)
>         at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>         ... 5 more
> 2017-07-07 15:48:17,277 INFO 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager: SHUTDOWN_MSG:
> /************************************************************
> SHUTDOWN_MSG: Shutting down NodeManager at xxx.gce.cloudera.com/aa.bb.cc.dd
> ************************************************************/
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to