The installation instructions do not indicate how to create systemd services.
1- When task nodes fail, will the job leader detect this and ssh and restart the task node? From my testing it doesn't seem like it. 2- How do we recover a lost node? Do we simply go back to the master node and run start-cluster.sh and the script is smart enough to figure out what is missing? 3- Or do we need to create systemd services and if so on which command do we start the service on?