At Sun, 08 Feb 2015 21:15:19 +0100, Corin Langosch wrote: > > Hi guys, > > I'm currently digging around in the sheepdog sources and have a few questions > regarding recovery and object consistency. > Please correct me if I'm wrong in anything I write here - it's all just read > together from various documents and source > files. > > Sheepdog keeps track which nodes are alive at a given point in time in an > epoch object. Every time a node joins/ leaves > the cluster a new epoch is genereated. A history of all epochs is kept. > Objects are mapped to nodes using consistend > hashing, the objects ec-chunks simply ordered to the neighbors nodes. Using > the epoch history we can map the same object > to the same node for any past cluster state. > > As for recovery, please consider the following cluster history and an object > A (2:1 ec): > > E Nodes Placement of chunks > 1 [] > - node1 joins > 2 [node1] not enough nodes > - node2 joins > 3 [node1, node2] A1=node2,A3=node1 > - node3 joins > 4 [node1, node2, node3] A1=node2,A2=node3,A3=node1 > - node4 joins, A3 is moved to the its new place > 5 [node1, node2, node3, node4] A1=node2,A2=node3,A3=node4 > - node4 crashes, A3 is recovered from A1+A2 > 6 [node1, node2, node3] A1=node2,A2=node3,A3=node1 > - whole cluster crashes > 7 [] > - node4 joins > 8 [node4] A3=node4 (no access, not enough nodes) > - node3 joins > 9 [node3, node4] A3=node4,A2=node3 (access, but A3 is > outdated!!!) > > How do you prevent that the outdated version of A3 on node4 is used? The > latest version of A3 is on node1 (epoch 6), but > how do we know this by only keeping track of the epochs? Afaik there's no > central repository which holds all object/ > chunk versions? > > Thank you in advance :)
At the epoch 8 and 9, client cannot access to sheepdog because all members of latest healthy epoch (in this case, 6) aren't gathered yet. In such a case, you can see an output of cluster info command like below: $ dog cluster info (git)-[vid-overflow] Cluster status: Waiting for other nodes to join cluster ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Cluster created at Tue Feb 10 16:20:10 2015 ... So I/O to outdated objects are prevented in the above case. Access is allowed after gathering node 1, 2, 3, and 4 in your above example. After gathering enough members of the latest heathy epoch, sheeps run recovery process. Recovery process is simple: 1. exchange information of owning objects each other 2. list up objects which should belong to me 3. E <- the latest epoch 4. read an object from sheeps based on epoch E, the sheeps are calculated based on consistent hashing 5. if no sheep processes have the object, E <- E - 1, go back to 3 and repeat the above 3 - 5 until completing recovery of all objects. So you don't need to worry about access to outdated object :) I understand your concern well. This is really subtle and important point of distributed storage systems including sheepdog. Thanks, Hitoshi -- sheepdog mailing list sheepdog@lists.wpkg.org https://lists.wpkg.org/mailman/listinfo/sheepdog