Assume you have a 30 node cluster distributed across three AZ’s with an RF of 3. Trying to come up with a runbook to manage multi-nodes failure as a result of …
- Loss of an entire AZ1 - Loss of multiple nodes in AZ2 - AZ3 unaffected. No node loss Is this is most optimal plan. Replacing dead nodes via bootstrapping … 1. Replace seeds nodes first (via bootstrap) 2. Bootstrap the few nodes in AZ2 3. Bootstrap all nodes in AZ1 4. Run a cluster repair. Do you wait to bootstrap everything before running repair or do you repair per node? Did I miss anything? ---------------- Thank you