Re: [BUG] rbd discard should return OK even if rbd file does not exist

2012-11-18 Thread Stefan Priebe - Profihost AG
Maybe thats the problem. Finalize gets never called so qemu block driver doesn't get any feedback and then cancels the io / request. Stefan Am 18.11.2012 um 03:38 schrieb Josh Durgin josh.dur...@inktank.com: On 11/17/2012 02:19 PM, Stefan Priebe wrote: Hello list, right now librbd returns

wip-bobtail-docs

2012-11-18 Thread Sage Weil
I created a branch for documentation changes that are pending for bobtail. I thought there was more than the one item in there now, but I'm not seeing anything else. Please add whatever you have there so that we can coordinate the changes with release. Thanks! sage -- To unsubscribe from

Re: [BUG] rbd discard should return OK even if rbd file does not exist

2012-11-18 Thread Stefan Priebe - Profihost AG
Its done as buffered io. So finalize is only called when buffered is false? Stefan Am 18.11.2012 um 03:38 schrieb Josh Durgin josh.dur...@inktank.com: On 11/17/2012 02:19 PM, Stefan Priebe wrote: Hello list, right now librbd returns an error if i issue a discard for a sector / byterange

Re: libcephfs create file with layout and replication

2012-11-18 Thread Noah Watkins
Wanna have a look at a first pass on this patch? wip-client-open-layout Thanks, Noah On Sat, Nov 17, 2012 at 5:20 PM, Noah Watkins jayh...@cs.ucsc.edu wrote: On Sat, Nov 17, 2012 at 4:15 PM, Sage Weil s...@inktank.com wrote: We ignore that for the purposes of getting the libcephfs API

Re: [BUG] rbd discard should return OK even if rbd file does not exist

2012-11-18 Thread Stefan Priebe
mhm ok the finalize is only called if building is false and there are no pending requests. But the discard issued is a bunch of 1000 requests or so. So the finalize is only called in the end. And it seems this is too late for qemu block driver. Greets, Stefan Am 18.11.2012 03:38, schrieb Josh

some snapshot problems

2012-11-18 Thread liu yaqi
-- Forwarded message -- From: liu yaqi liuyaqiy...@gmail.com Date: 2012/11/12 Subject: Re: some snapshot problems To: Sage Weil s...@inktank.com 抄送: ceph-devel@vger.kernel.org, c...@ist.utl.pt, josh.dur...@inktank.com 2012/11/9 Sage Weil s...@inktank.com Lots of different

[PATCH 00/16] Various fixes for mds

2012-11-18 Thread Yan, Zheng
Hi, This series of patches are fixes for various of problems I encountered when running 2 MDS with thrash_exports enabled. The first 3 patches are general fixes for MDS, the rest patches are fixes for bugs that only happen in multiple MDS setup. I tested these patches using 2 MDS with

[PATCH 01/16] mds: don't expire log segment before it's fully flushed

2012-11-18 Thread Yan, Zheng
From: Yan, Zheng zheng.z@intel.com Expiring log segment before it's fully flushed may cause various issues during log replay. Signed-off-by: Yan, Zheng zheng.z@intel.com --- src/mds/MDLog.cc | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/src/mds/MDLog.cc

[PATCH 02/16] mds: fix anchor table update

2012-11-18 Thread Yan, Zheng
From: Yan, Zheng zheng.z@intel.com The reference count of an anchor table entry that corresponds to directory is number of anchored inodes under the directory. But when updating anchor trace for an directory inode, the code only increases/decreases its new/old ancestor anchor table entries'

[PATCH 03/16] mds: don't add not issued caps when confirming cap receipt

2012-11-18 Thread Yan, Zheng
From: Yan, Zheng zheng.z@intel.com There is message ordering race in cephfs kernel client. We compose cap messages when i_ceph_lock is hold. But when adding messages to the output queue, the kernel releases i_ceph_lock and acquires a mutex. So it is possible that cap messages are send out of

[PATCH 04/16] mds: allow try_eval to eval unstable locks in freezing object

2012-11-18 Thread Yan, Zheng
From: Yan, Zheng zheng.z@intel.com Unstable locks hold auth_pins on the object, it prevents the freezing object become frozen and then unfreeze. So try_eval() should not wait for freezing object Signed-off-by: Yan, Zheng zheng.z@intel.com --- src/mds/Locker.cc | 4 ++-- 1 file changed,

[PATCH 05/16] mds: Don't acquire replica object's versionlock

2012-11-18 Thread Yan, Zheng
From: Yan, Zheng zheng.z@intel.com Both CInode and CDentry's versionlocks are of type LocalLock. Acquiring LocalLock in replica object is useless and problematic. For example, if two requests try acquiring a replica object's versionlock, the first request succeeds, the second request is added

[PATCH 06/16] mds: clear lock flushed if replica is waiting for AC_LOCKFLUSHED

2012-11-18 Thread Yan, Zheng
From: Yan, Zheng zheng.z@intel.com So eval_gather() will not skip calling scatter_writebehind(), otherwise the replica lock may be in flushing state forever. Signed-off-by: Yan, Zheng zheng.z@intel.com --- src/mds/Locker.cc | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-)

[PATCH 07/16] mds: call eval() after caps are exported

2012-11-18 Thread Yan, Zheng
From: Yan, Zheng zheng.z@intel.com For an inode just changed authority, if the new auth MDS want to change a lock in the inode from 'sync' to 'lock' state before caps are exported. The lock in replica can be in 'sync-lock' state because client caps prevent it from transitting to 'lock' state.

[PATCH 08/16] mds: don't forward client request from MDS

2012-11-18 Thread Yan, Zheng
From: Yan, Zheng zheng.z@intel.com Forwarding client request that was from MDS will trigger assertion in MDS::forward_message_mds(). MDS only send client requests for stray migration/reintegration, so it's safe to drop them. Signed-off-by: Yan, Zheng zheng.z@intel.com ---

[PATCH 09/16] mds: check parent inode's versionlock when propagating rstats

2012-11-18 Thread Yan, Zheng
From: Yan, Zheng zheng.z@intel.com To propagate rstats to one level up, we need lock both parent inode's nestlock and versionlock. Signed-off-by: Yan, Zheng zheng.z@intel.com --- src/mds/MDCache.cc | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/mds/MDCache.cc

[PATCH 11/16] mds: consider revoking caps in imported caps as issued

2012-11-18 Thread Yan, Zheng
From: Yan, Zheng zheng.z@intel.com The clients may already send caps release message to the exporting MDS, so the importing MDS waits for the release message forever. consider revoking caps as issued can avoid this issue. Signed-off-by: Yan, Zheng zheng.z@intel.com ---

[PATCH 10/16] mds: drop locks if requiring auth pinning new objects.

2012-11-18 Thread Yan, Zheng
From: Yan, Zheng zheng.z@intel.com Locker::acquire_locks() skip auth pinning replica object if we only request a rdlock and the lock is read-lockable. To get all locks, we may call Locker::acquire_locks() several times, locks in replca objects may become not read-lockable between calls. So it

[PATCH 12/16] mds: fix open_remote_inode race

2012-11-18 Thread Yan, Zheng
From: Yan, Zheng zheng.z@intel.com discover_ino() may return -ENOENT if it races with other FS activities. so use C_MDC_RetryOpenRemoteIno instead of C_MDC_OpenRemoteIno as onfinish callback. Signed-off-by: Yan, Zheng zheng.z@intel.com --- src/mds/MDCache.cc | 5 +++-- 1 file changed, 3

[PATCH 15/16] mds: allow open_remote_ino() to open xlocked dentry

2012-11-18 Thread Yan, Zheng
From: Yan, Zheng zheng.z@intel.com discover_ino() has a parameter want_xlocked. The parameter indicates if remote discover handler can proceed when xlocked dentry is encountered. open_remote_ino() uses discover_ino() to find non-auth inode, but always set 'want_xlocked' to false. This may

[PATCH 14/16] mds: fix freeze inode deadlock

2012-11-18 Thread Yan, Zheng
From: Yan, Zheng zheng.z@intel.com CInode::freeze_inode() is used in the case of cross authority rename. Server::handle_slave_rename_prep() calls it to wait for all other operations on source inode to complete. This happens after all locks for the rename operation are acquired. But to acquire

[PATCH 0/6] fixes for cephfs

2012-11-18 Thread Yan, Zheng
Hi, These patchs are fixes for cephfs kernel client bugs when running 2 MDS with thrash_exports enabled. These patches are also in: git://github.com/ukernel/ceph-client.git master Regards Yan, Zheng -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a

[PATCH 1/6] ceph: Don't update i_max_size when handling non-auth cap

2012-11-18 Thread Yan, Zheng
From: Yan, Zheng zheng.z@intel.com The cap from non-auth mds doesn't have a meaningful max_size value. Signed-off-by: Yan, Zheng zheng.z@intel.com --- fs/ceph/caps.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c index

[PATCH 2/6] ceph: Hold caps_list_lock when adjusting caps_{use,total}_count

2012-11-18 Thread Yan, Zheng
From: Yan, Zheng zheng.z@intel.com Signed-off-by: Yan, Zheng zheng.z@intel.com --- fs/ceph/caps.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c index c633d1d..8072aef 100644 --- a/fs/ceph/caps.c +++ b/fs/ceph/caps.c @@ -236,8 +236,10 @@ static

[PATCH 3/6] ceph: Fix infinite loop in __wake_requests

2012-11-18 Thread Yan, Zheng
From: Yan, Zheng zheng.z@intel.com __wake_requests() will enter infinite loop if we use it to wake requests in the session-s_waiting list. __wake_requests() deletes requests from the list and __do_request() adds requests back to the list. Signed-off-by: Yan, Zheng zheng.z@intel.com ---

[PATCH 4/6] ceph: Don't add dirty inode to dirty list if caps is in migration

2012-11-18 Thread Yan, Zheng
From: Yan, Zheng zheng.z@intel.com Add dirty inode to cap_dirty_migrating list instead, this can avoid ceph_flush_dirty_caps() entering infinite loop. Signed-off-by: Yan, Zheng zheng.z@intel.com --- fs/ceph/caps.c | 10 +++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff

[PATCH 6/6] ceph: call handle_cap_grant() for cap import message

2012-11-18 Thread Yan, Zheng
From: Yan, Zheng zheng.z@intel.com If client sends cap message that requests new max size during exporting caps, the exporting MDS will drop the message quietly. So the client may wait for the reply that updates the max size forever. call handle_cap_grant() for cap import message can avoid

[PATCH 5/6] ceph: Fix __ceph_do_pending_vmtruncate

2012-11-18 Thread Yan, Zheng
From: Yan, Zheng zheng.z@intel.com we should set i_truncate_pending to 0 after page cache is truncated to i_truncate_size Signed-off-by: Yan, Zheng zheng.z@intel.com --- fs/ceph/inode.c | 15 +-- 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/fs/ceph/inode.c