zuston opened a new issue, #263:
URL: https://github.com/apache/incubator-uniffle/issues/263

   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   
   
   ### Search before asking
   
   - [X] I have searched in the 
[issues](https://github.com/apache/incubator-uniffle/issues?q=is%3Aissue) and 
found no similar issues.
   
   
   ### What would you like to be improved?
   
   Currently, the cluster of shuffle-servers is hard to update quickly. In the 
on-premise deployment, we have to write the custom ansible playbook to rolling 
update, and the upgrading process is too long.
   
   The rolling update process have two steps:
   1. Add the exclude shuffle-servers id into exclude node file, which should 
be recognized by coordinator and no long assigned
   2. Waiting the all apps to finished in these excluded shuffle-servers.
   
   Because the shutdown of shuffle-server will lose all app state and cleanup 
the datafile, we have to handle with care. 
   
   If we could store the all state into levelDB or rockdb or localfile(Like 
Yarn Nodemanager), maybe it will benefit for us to restart quickly and wont 
make app fail.
   
   ### How should we improve?
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@uniffle.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to