zuston opened a new issue, #263: URL: https://github.com/apache/incubator-uniffle/issues/263
### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) ### Search before asking - [X] I have searched in the [issues](https://github.com/apache/incubator-uniffle/issues?q=is%3Aissue) and found no similar issues. ### What would you like to be improved? Currently, the cluster of shuffle-servers is hard to update quickly. In the on-premise deployment, we have to write the custom ansible playbook to rolling update, and the upgrading process is too long. The rolling update process have two steps: 1. Add the exclude shuffle-servers id into exclude node file, which should be recognized by coordinator and no long assigned 2. Waiting the all apps to finished in these excluded shuffle-servers. Because the shutdown of shuffle-server will lose all app state and cleanup the datafile, we have to handle with care. If we could store the all state into levelDB or rockdb or localfile(Like Yarn Nodemanager), maybe it will benefit for us to restart quickly and wont make app fail. ### How should we improve? _No response_ ### Are you willing to submit PR? - [ ] Yes I am willing to submit a PR! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@uniffle.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org