On Tue, Nov 26, 2013 at 12:54:57PM +0900, MORITA Kazutaka wrote: > At Mon, 25 Nov 2013 17:02:06 +0800, > Liu Yuan wrote: > > > > On Mon, Nov 25, 2013 at 05:43:19PM +0900, MORITA Kazutaka wrote: > > > At Mon, 25 Nov 2013 15:03:46 +0800, > > > Robin Dong wrote: > > > > > > > > The present implementation of http/swift is not perfect, it can't create > > > > too much containers or objects. So we want to store all objects in one > > > > hyper volume vdi and use new structure 'obj-inode' to identify its > > > > offset > > > > and length in this vdi, just like some local file system. To achieve > > > > this, > > > > we need distributed locks to ensure that only one thread can create a > > > > new > > > > 'obj-inode' (or delete) in this vdi at a same time. > > > > > > > > This patch set is a try to implement the distributed lock. > > > > > > > > If we add code in sheep/cluster/zookeeper.c and use the framework of > > > > cluster to implement this distributed lock, then we have to add > > > > implementation for corosyncălocal and shepherd. That's too complicated. > > > > So > > > > what we need is adding lock.c in sheep/http/ and only use it in http > > > > interface. > > > > > > If possible, I don't like to see zookeeper specific codes out side of > > > sheep/cluster/zookeeper.c. Can we use a SD_OP_TYPE_CLUSTER operation > > > for your purpose? It works like a cluster-wide distributed lock. > > > > > > For example, vdi creation works like as follows. > > > > > > 1. When sheep receives a SD_OP_NEW_VDI operation, sheep calls > > > cdrv->block() to block all the other cluster operations. > > > > > > 2. Sheep calls cluster_new_vdi() in sd_block_handler(). It is > > > ensured that no other sheep call sd_block_handler() at the same > > > time. This is necessary here because sheepdog doesn't allow > > > concurrent vdi creation requests. > > > > > > 3. All the sheep in the cluster call post_cluster_new_vdi() in > > > sd_notify_handler(). It is usually used for notification or > > > cleanups. > > > > > > > I don't think this approach is effecient though it is simpler because we can > > make use of exsiting mechanism, since: > > > > - it can't scale, meaning there is only one lock in the cluster. > > And every object creations from different containers will try to compete > > for > > this lock. > > > > - can be affected by operations even not related to http operations. For > > example, > > 'vdi create' will block the cluster, it means before it unblocks the > > cluster, > > we can't create/delete objects|container at all. > > > > I think a lock per operation is really needed. E.g, every container has a > > lock > > to achieve concurence of creating objects and won't interfere with other > > containers. > > Getting a distributed lock is an expensive operation and it can causes > a severe performance problem if we do it for each object creation. > Can we find another way? Sheepdog is not designed to allow concurrent > write access. >
Every container associate a lock, I think the performance isn't that bad, no? > > For example, how about determining one gateway based on the hash value > of the requested container name, and forwarding write requests to the > appropriate gateway so that all the objects in the same container is > accessed from only one gateway? > What about the determined gateway is crashed? The fail-over wouldn't be so easy, since the states related to container/object operations are only stored in the crashed gateway. Thanks Yuan -- sheepdog mailing list sheepdog@lists.wpkg.org http://lists.wpkg.org/mailman/listinfo/sheepdog