At Thu, 9 Aug 2012 16:43:38 +0800, Yunkai Zhang wrote: > > From: Yunkai Zhang <qiushu....@taobao.com> > > V2: > - fix a typo > - when an object is updated, delete it old version > - reset cluster recovery state in finish_recovery() > > Yunkai Zhang (11): > sheep: enable variale-length of join_message in response of join > event > sheep: share joining nodes with newly added sheep > sheep: delay to process recovery caused by LEAVE event just like JOIN > event > sheep: don't cleanup working directory when sheep joined back > sheep: read objects only from live nodes > sheep: write objects only on live nodes > sheep: mark dirty object that belongs to the leaving nodes > sheep: send dirty object list to each sheep when cluster do recovery > sheep: do recovery with dirty object list > collie: update 'collie cluster recover info' commands > collie: update doc about 'collie cluster recover disable' > > collie/cluster.c | 46 ++++++++--- > include/internal_proto.h | 32 ++++++-- > include/sheep.h | 23 ++++++ > man/collie.8 | 2 +- > sheep/cluster.h | 29 +------ > sheep/cluster/accord.c | 2 +- > sheep/cluster/corosync.c | 9 ++- > sheep/cluster/local.c | 2 +- > sheep/cluster/zookeeper.c | 2 +- > sheep/farm/trunk.c | 2 +- > sheep/gateway.c | 39 ++++++++- > sheep/group.c | 202 > +++++++++++++++++++++++++++++++++++++++++----- > sheep/object_list_cache.c | 182 +++++++++++++++++++++++++++++++++++++++-- > sheep/ops.c | 85 ++++++++++++++++--- > sheep/recovery.c | 133 +++++++++++++++++++++++++++--- > sheep/sheep_priv.h | 57 ++++++++++++- > 16 files changed, 743 insertions(+), 104 deletions(-)
I've looked into this series, and IMHO the change is too complex. With this series, when recovery is disabled and there are left nodes, sheep can succeed in a write operation even if the data is not fully replicated. But, if we allow it, it is difficult to prevent VMs from reading old data. Actually this series put a lot of effort into it. I'd suggest allowing epoch increment even when recover is disabled. If recovery work recovers only rw->prio_oids and delays the recovery of rw->oids, I think we can get the similar benefit with much simpler way: http://www.mail-archive.com/sheepdog@lists.wpkg.org/msg05439.html Thanks, Kazutaka -- sheepdog mailing list sheepdog@lists.wpkg.org http://lists.wpkg.org/mailman/listinfo/sheepdog