Signed-off-by: Valerio Pachera <siri...@gmail.com> --- doc/fail_over.rst | 36 ++++++++++++++++++++++++++++++++++++ 1 file changed, 36 insertions(+) create mode 100644 doc/fail_over.rst
diff --git a/doc/fail_over.rst b/doc/fail_over.rst new file mode 100644 index 0000000..892d79d --- /dev/null +++ b/doc/fail_over.rst @@ -0,0 +1,36 @@ +Fail Over +========= + +Now we are able to manage guests on our cluster and we want to check if it's +really able to survive a node loss. +Start a guest on any of the node. +Find the node ID you wish to fail by *'vdi list'* +(not the node where the guest is running, of course). +Then kill the node: + +:: + + # dog node kill 3 + +Guest is still running without any problem and by 'dog node list' you'll see +that one node is missing. + +But how do we know if sheepdog is recovering the "lost" data? + +*(At this very moment, some objects have only 1 copy instead of 2. +The second copy has to be rebuild on the active nodes)*. + +:: + + # dog node recovery + Nodes In Recovery: + Id Host:Port V-Nodes Zone + 0 192.168.2.41:7000 50 688040128 + 1 192.168.2.42:7000 50 704817344 + 2 192.168.2.43:7000 92 721594560 + +Here you can see which nodes are receiving data. +Once done, the list will be empty. + +**IMPORTANT:** +do not remove other nodes from the cluster during recovery! -- 1.7.10.4 -- sheepdog mailing list sheepdog@lists.wpkg.org http://lists.wpkg.org/mailman/listinfo/sheepdog