[ https://issues.apache.org/jira/browse/YARN-8665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16621049#comment-16621049 ]
Eric Yang commented on YARN-8665: --------------------------------- [~csingh] Thank you for the patch. When cancel upgrade is triggered, app master seems to reset all instance to state NEEDS_UPGRADE, and set service state to CANCEL_UPGRADING. It becomes hard to identify if the instance should be restarted with original configuration. I think we can avoid the step of reset NEEDS_UPGRADE state for all instances. Instances at READY/FAILED_UPGRADE state should be marked for NEEDS_UPGRADE, and instances at NEEDS_UPGRADE state should be reset to RUNNING_BUT_NOT_READY to revert the process. The inversion approach might work better to restore service to its original form without version control reinit process. If you choose to stay on course of the current implementation, node manager report back to app master might need to introduce new versioning mechanism of the operation performed. This helps to track if the reinit operation was performed for upgrade or upgrade cancel operation like you described as a separate JIRA. However, I would feel more comfortable to solve the problem in this JIRA to make sure we don't destabilize the code base. I also try to launch the app, and trigger upgrade with -initiate flag, then cancel with -cancel flag without actually upgrade any instance. When this is performed, the service stuck in CANCEL_UPGRADING state without revert back to STABLE state. > Yarn Service Upgrade: Support cancelling upgrade > ------------------------------------------------- > > Key: YARN-8665 > URL: https://issues.apache.org/jira/browse/YARN-8665 > Project: Hadoop YARN > Issue Type: Sub-task > Reporter: Chandni Singh > Assignee: Chandni Singh > Priority: Major > Attachments: YARN-8665.001.patch > > > When a service is upgraded without auto-finalization or express upgrade, then > the upgrade can be cancelled. This provides the user ability to test upgrade > of a single instance and if that doesn't go well, they get a chance to cancel > it. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org