[PATCH 00/29] Various fixes for MDS

2013-01-04 Thread Yan, Zheng
From: Yan, Zheng zheng.z@intel.com This patch series fix various issues I encountered when running 3 MDS. I test this patch series by runing fsstress on two clients, using the same test directory. The MDS and clients could survived overnight test at times. This patch series are also in:

[PATCH 01/29] mds: don't renew revoking lease

2013-01-04 Thread Yan, Zheng
From: Yan, Zheng zheng.z@intel.com MDS may receives lease renew request while lease is being revoked, just ignore the renew request. Signed-off-by: Yan, Zheng zheng.z@intel.com --- src/mds/Locker.cc | 26 ++ 1 file changed, 14 insertions(+), 12 deletions(-) diff

[PATCH 02/29] mds: fix Locker::simple_eval()

2013-01-04 Thread Yan, Zheng
From: Yan, Zheng zheng.z@intel.com Locker::simple_eval() checks if the loner wants CEPH_CAP_GEXCL to decide if it should change the lock to EXCL state, but it checks if CEPH_CAP_GEXCL is issued to the loner to decide if it should change the lock to SYNC state. So if the loner wants

[PATCH 03/29] mds: don't trigger assertion when discover races with rename

2013-01-04 Thread Yan, Zheng
From: Yan, Zheng zheng.z@intel.com Discover reply that adds replica dentry and inode can race with rename if slave request for rename sends discover and waits, but waked up by reply for different discover. Signed-off-by: Yan, Zheng zheng.z@intel.com --- src/mds/CDentry.cc | 17

[PATCH 04/29] mds: xlock stray dentry when handling rename or unlink

2013-01-04 Thread Yan, Zheng
From: Yan, Zheng zheng.z@intel.com This prevents MDS from reintegrating stray before rename/unlink finishes Signed-off-by: Yan, Zheng zheng.z@intel.com --- src/mds/Server.cc | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/mds/Server.cc b/src/mds/Server.cc index

[PATCH 05/29] mds: don't journal null dentry for overwrited remote linkage

2013-01-04 Thread Yan, Zheng
From: Yan, Zheng zheng.z@intel.com Server::_rename_prepare() adds null dest dentry to the EMetaBlob if the rename operation overwrites a remote linkage. This is incorrect because null dentry are processed after primary and remote dentries during journal replay. The erroneous null dentry makes

[PATCH 06/29] mds: use null dentry to find old parent of renamed directory

2013-01-04 Thread Yan, Zheng
From: Yan, Zheng zheng.z@intel.com When replaying an directory rename operation, MDS need to find old parent of the renamed directory to adjust auth subtree. Current code searchs the cache to find the old parent, it does not work if the renamed directory inode is not in the cache. EMetaBlob

[PATCH 07/29] mds: don't trim ambiguous imports in MDCache::trim_non_auth_subtree

2013-01-04 Thread Yan, Zheng
From: Yan, Zheng zheng.z@intel.com Trimming ambiguous imports in MDCache::trim_non_auth_subtree() confuses MDCache::disambiguate_imports() and causes infinite loop. Signed-off-by: Yan, Zheng zheng.z@intel.com --- src/mds/MDCache.cc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)

[PATCH 08/29] mds: only export directory fragments in stray to their auth MDS

2013-01-04 Thread Yan, Zheng
From: Yan, Zheng zheng.z@intel.com Signed-off-by: Yan, Zheng zheng.z@intel.com --- src/mds/Migrator.cc | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/src/mds/Migrator.cc b/src/mds/Migrator.cc index 5db21cd..8686c86 100644 --- a/src/mds/Migrator.cc +++

[PATCH 10/29] mds: skip frozen inode when assimilating dirty inodes' rstat

2013-01-04 Thread Yan, Zheng
From: Yan, Zheng zheng.z@intel.com CDir::assimilate_dirty_rstat_inodes() may encounter frozen inodes that are being renamed. Skip these frozen inodes because assimilating inode's rstat require auth pinning the inode. Signed-off-by: Yan, Zheng zheng.z@intel.com --- src/mds/CDir.cc | 13

[PATCH 09/29] mds: mark rename inode as ambiguous auth on all involved MDS

2013-01-04 Thread Yan, Zheng
From: Yan, Zheng zheng.z@intel.com When handling cross authority rename, the master first sends OP_RENAMEPREP slave requests to witness MDS, then sends OP_RENAMEPREP slave request to the rename inode's auth MDS after getting all witness MDS' acknowledgments. Before receiving the OP_RENAMEPREP

[PATCH 11/29] mds: fix anchor table commit race

2013-01-04 Thread Yan, Zheng
From: Yan, Zheng zheng.z@intel.com Anchor table updates for a given inode is fully serialized on client side. But due to network latency, two commit requests from different clients can arrive to anchor server out of order. The anchor table gets corrupted if updates are committed in wrong

[PATCH 13/29] mds: indroduce DROPLOCKS slave request

2013-01-04 Thread Yan, Zheng
From: Yan, Zheng zheng.z@intel.com In some rare case, Locker::acquire_locks() drops all acquired locks in order to auth pin new objects. But Locker::drop_locks only drops explicitly acquired remote locks, does not drop objects' version locks that were implicitly acquired on remote MDS. These

[PATCH 15/29] mds: remove unnecessary is_xlocked check

2013-01-04 Thread Yan, Zheng
From: Yan, Zheng zheng.z@intel.com Locker::foo_eval() is always called for stable locks, so no need to check if the lock is xlocked. Signed-off-by: Yan, Zheng zheng.z@intel.com --- src/mds/Locker.cc | 21 - 1 file changed, 4 insertions(+), 17 deletions(-) diff --git

[PATCH 14/29] mds: fix lock state transition check

2013-01-04 Thread Yan, Zheng
From: Yan, Zheng zheng.z@intel.com Locker::simple_excl() and Locker::scatter_mix() miss is_rdlocked check; Locker::file_excl() miss is_rdlocked check and is_wrlocked check. Signed-off-by: Yan, Zheng zheng.z@intel.com --- src/mds/Locker.cc | 9 + 1 file changed, 9 insertions(+)

[PATCH 17/29] mds: call maybe_eval_stray after removing a replica dentry

2013-01-04 Thread Yan, Zheng
From: Yan, Zheng zheng.z@intel.com MDCache::handle_cache_expire() processes dentries after inodes, so the MDCache::maybe_eval_stray() in MDCache::inode_remove_replica() always fails to remove stray inode because MDCache::eval_stray() checks if the stray inode's dentry is replicated.

[PATCH 16/29] mds: don't defer processing caps if inode is auth pinned

2013-01-04 Thread Yan, Zheng
From: Yan, Zheng zheng.z@intel.com We should not defer processing caps if the inode is auth pinned by MDRequest, because the MDRequest may change lock state of the inode later and wait for the deferred caps. Signed-off-by: Yan, Zheng zheng.z@intel.com --- src/mds/Locker.cc | 9 -

[PATCH 20/29] mds: forbid creating file in deleted directory

2013-01-04 Thread Yan, Zheng
From: Yan, Zheng zheng.z@intel.com Signed-off-by: Yan, Zheng zheng.z@intel.com --- src/mds/CInode.cc | 5 - src/mds/CInode.h | 2 ++ src/mds/Server.cc | 13 - 3 files changed, 18 insertions(+), 2 deletions(-) diff --git a/src/mds/CInode.cc b/src/mds/CInode.cc index

[PATCH 19/29] mds: disable concurrent remote locking

2013-01-04 Thread Yan, Zheng
From: Yan, Zheng zheng.z@intel.com Current code allows multiple MDRequests to concurrently acquire a remote lock. But a lock ACK message wakes all requests because they were all put to the same waiting queue. One request gets the lock, the rest requests will re-send the OP_WRLOCK/OPWRLOCK

[PATCH 18/29] mds: fix rename inode exportor check

2013-01-04 Thread Yan, Zheng
From: Yan, Zheng zheng.z@intel.com Use srcdn-is_auth() destdnl-is_primary() to check if the MDS is inode exportor of rename operation is not reliable, This is because OP_FINISH slave request may race with subtree import. The fix is use a variable in MDRequest to indicate if the MDS is inode

[PATCH 21/29] mds: keep dentry lock in sync state as much as possible

2013-01-04 Thread Yan, Zheng
From: Yan, Zheng zheng.z@intel.com Unlike locks of other types, dentry lock in unreadable state can block path traverse, so it should be in sync state as much as possible. there are two rare cases that dentry lock is not set to sync state: the dentry becomes replicated; finishing xlock but

[PATCH 22/29] mds: fix replica state for LOCK_MIX_LOCK

2013-01-04 Thread Yan, Zheng
From: Yan, Zheng zheng.z@intel.com LOCK_MIX_LOCK state is for gathering local locks and caps, so replica state should be LOCK_MIX. Signed-off-by: Yan, Zheng zheng.z@intel.com --- src/mds/locks.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/mds/locks.c

[PATCH 23/29] mds: fix cap mask for ifile lock

2013-01-04 Thread Yan, Zheng
From: Yan, Zheng zheng.z@intel.com ifile lock has 8 cap bits, should its cap mask should be 0xff Signed-off-by: Yan, Zheng zheng.z@intel.com --- src/mds/SimpleLock.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/mds/SimpleLock.h b/src/mds/SimpleLock.h index

[PATCH 24/29] mds: rdlock prepended dest trace when handling rename

2013-01-04 Thread Yan, Zheng
From: Yan, Zheng zheng.z@intel.com rdlock prepended dest trace to prevent them from being xlocked by someone else. Signed-off-by: Yan, Zheng zheng.z@intel.com --- src/mds/Server.cc | 1 + 1 file changed, 1 insertion(+) diff --git a/src/mds/Server.cc b/src/mds/Server.cc index

[PATCH 25/29] mds: check null context in CDir::fetch()

2013-01-04 Thread Yan, Zheng
From: Yan, Zheng zheng.z@intel.com Signed-off-by: Yan, Zheng zheng.z@intel.com --- src/mds/CDir.cc | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/src/mds/CDir.cc b/src/mds/CDir.cc index 91636cc..22cdf48 100644 --- a/src/mds/CDir.cc +++ b/src/mds/CDir.cc @@

[PATCH 26/29] mds: drop locks when opening remote dentry

2013-01-04 Thread Yan, Zheng
From: Yan, Zheng zheng.z@intel.com Opening remote dentry while holding locks may cause dead lock. For example, 'discover' is blocked by a xlocked dentry, the request holding the xlock is blocked by the locks hold by the readdir request. Signed-off-by: Yan, Zheng zheng.z@intel.com ---

[PATCH 28/29] mds: don't issue caps while inode is exporting caps

2013-01-04 Thread Yan, Zheng
From: Yan, Zheng zheng.z@intel.com If issue caps while inode is exporting caps, the client will drop the caps soon when it receives the CAP_OP_EXPORT message, but the client will not receive corresponding CAP_OP_IMPORT message. Except open file request, it's OK to not issue caps for client

[PATCH 29/29] mds: optimize C_MDC_RetryOpenRemoteIno

2013-01-04 Thread Yan, Zheng
From: Yan, Zheng zheng.z@intel.com When opening remote inode, C_MDC_RetryOpenRemoteIno is used as onfinish context for discovering remote inode. When it is called, the MDS may already have the inode. Signed-off-by: Yan, Zheng zheng.z@intel.com --- src/mds/MDCache.cc | 6 +- 1 file

[PATCH 27/29] mds: check if stray dentry is needed

2013-01-04 Thread Yan, Zheng
From: Yan, Zheng zheng.z@intel.com The necessity of stray dentry can change before the request acquires all locks. Signed-off-by: Yan, Zheng zheng.z@intel.com --- src/mds/Server.cc | 26 -- 1 file changed, 16 insertions(+), 10 deletions(-) diff --git

[PATCH 0/6] fixes for cephfs kernel client

2013-01-04 Thread Yan, Zheng
From: Yan, Zheng zheng.z@intel.com These patches are client side modification for multiple MDS setup. I'm not quite sure if patch 6 is correct, please review it. These patches are also in: git://github.com/ukernel/ceph-client.git wip-ceph -- To unsubscribe from this list: send the line

[PATCH 1/6] ceph: re-calculate truncate_size for strip object

2013-01-04 Thread Yan, Zheng
From: Yan, Zheng zheng.z@intel.com Otherwise osd may truncate the object to larger size. Signed-off-by: Yan, Zheng zheng.z@intel.com --- net/ceph/osd_client.c | 8 1 file changed, 8 insertions(+) diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c index eb9a444..267f183

[PATCH 3/6] ceph: allow revoking duplicated caps issued by non-auth MDS

2013-01-04 Thread Yan, Zheng
From: Yan, Zheng zheng.z@intel.com Allow revoking duplicated caps issued by non-auth MDS if these caps are also issued by auth MDS. Signed-off-by: Yan, Zheng zheng.z@intel.com --- fs/ceph/caps.c | 12 1 file changed, 8 insertions(+), 4 deletions(-) diff --git

[PATCH 4/6] ceph: allocate cap_release message when receiving cap import

2013-01-04 Thread Yan, Zheng
From: Yan, Zheng zheng.z@intel.com When client wants to release an imported cap, it's possible there is no reserved cap_release message in corresponding mds session. so __queue_cap_release causes kernel panic. Signed-off-by: Yan, Zheng zheng.z@intel.com --- fs/ceph/caps.c | 3 +++ 1

[PATCH 5/6] ceph: check mds_wanted for imported cap

2013-01-04 Thread Yan, Zheng
From: Yan, Zheng zheng.z@intel.com The MDS may have incorrect wanted caps after importing caps. So the client should check the value mds has and send cap update if necessary. Signed-off-by: Yan, Zheng zheng.z@intel.com --- fs/ceph/caps.c | 10 ++ 1 file changed, 6 insertions(+),

[PATCH 6/6] ceph: don't acquire i_mutex ceph_vmtruncate_work

2013-01-04 Thread Yan, Zheng
From: Yan, Zheng zheng.z@intel.com In commit 22cddde104, ceph_get_caps() was moved into ceph_write_begin(). So ceph_get_caps() can be called while i_mutex is locked. If there is pending vmtruncate, ceph_get_caps() will wait for it to complete, but ceph_vmtruncate_work() is blocked by the

Re: [PATCH 0/2] Librados aio stat

2013-01-04 Thread Filippos Giannakos
Hi Team, Is there any progress or any comments regarding the librados aio stat patch ? Best regards On 12/20/2012 10:05 PM, Filippos Giannakos wrote: Hi Team, Here is the patch with the changes, plus the tests you requested. Best regards, Filippos -- Filippos. -- To unsubscribe from this

[PATCH REPOST 0/6] libceph: parameter cleanup

2013-01-04 Thread Alex Elder
This series mostly cleans up parameters used by functions in libceph, in the osd client code. -Alex [PATCH REPOST 1/6] libceph: pass length to ceph_osdc_build_request() [PATCH REPOST 2/6] libceph: pass length to ceph_calc_file_object_mapping() [PATCH

[PATCH REPOST 1/6] libceph: pass length to ceph_osdc_build_request()

2013-01-04 Thread Alex Elder
The len argument to ceph_osdc_build_request() is set up to be passed by address, but that function never updates its value so there's no need to do this. Tighten up the interface by passing the length directly. Signed-off-by: Alex Elder el...@inktank.com --- drivers/block/rbd.c |

[PATCH REPOST 2/6] libceph: pass length to ceph_calc_file_object_mapping()

2013-01-04 Thread Alex Elder
ceph_calc_file_object_mapping() takes (among other things) a file offset and length, and based on the layout, determines the object number (bno) backing the affected portion of the file's data and the offset into that object where the desired range begins. It also computes the size that should be

[PATCH REPOST 3/6] libceph: drop snapid in ceph_calc_raw_layout()

2013-01-04 Thread Alex Elder
A snapshot id must be provided to ceph_calc_raw_layout() even though it is not needed at all for calculating the layout. Where the snapshot id *is* needed is when building the request message for an osd operation. Drop the snapid parameter from ceph_calc_raw_layout() and pass that value instead

[PATCH REPOST 4/6] libceph: drop osdc from ceph_calc_raw_layout()

2013-01-04 Thread Alex Elder
The osdc parameter to ceph_calc_raw_layout() is not used, so get rid of it. Consequently, the corresponding parameter in calc_layout() becomes unused, so get rid of that as well. Signed-off-by: Alex Elder el...@inktank.com --- drivers/block/rbd.c |2 +-

[PATCH REPOST 5/6] libceph: don't set flags in ceph_osdc_alloc_request()

2013-01-04 Thread Alex Elder
The only thing ceph_osdc_alloc_request() really does with the flags value it is passed is assign it to the newly-created osd request structure. Do that in the caller instead. Both callers subsequently call ceph_osdc_build_request(), so have that function (instead of ceph_osdc_alloc_request())

[PATCH REPOST 6/6] libceph: don't set pages or bio in, ceph_osdc_alloc_request()

2013-01-04 Thread Alex Elder
Only one of the two callers of ceph_osdc_alloc_request() provides page or bio data for its payload. And essentially all that function was doing with those arguments was assigning them to fields in the osd request structure. Simplify ceph_osdc_alloc_request() by having the caller take care of

[PATCH REPOST 0/4] rbd: explicitly support only one osd op

2013-01-04 Thread Alex Elder
An osd request can be made up of multiple ops, all of which are completed (or not) transactionally. There is partial support for multiple ops in an rbd request in the rbd code, but it's incomplete and not even supported by the osd client or the messenger right now. I see three problems with this

[PATCH REPOST 1/4] rbd: pass num_op with ops array

2013-01-04 Thread Alex Elder
Add a num_op parameter to rbd_do_request() and rbd_req_sync_op() to indicate the number of entries in the array. The callers of these functions always know how many entries are in the array, so just pass that information down. This is in anticipation of eliminating the extra zero-filled entry in

[PATCH REPOST 2/4] libceph: pass num_op with ops

2013-01-04 Thread Alex Elder
Both ceph_osdc_alloc_request() and ceph_osdc_build_request() are provided an array of ceph osd request operations. Rather than just passing the number of operations in the array, the caller is required append an additional zeroed operation structure to signal the end of the array. All callers

[PATCH REPOST 3/4] rbd: there is really only one op

2013-01-04 Thread Alex Elder
Throughout the rbd code there are spots where it appears we can handle an osd request containing more than one osd request op. But that is only the way it appears. In fact, currently only one operation at a time can be supported, and supporting more than one will require much more than fleshing

[PATCH REPOST 4/4] rbd: assume single op in a request

2013-01-04 Thread Alex Elder
We now know that every of rbd_req_sync_op() passes an array of exactly one operation, as evidenced by all callers passing 1 as its num_op argument. So get rid of that argument, assuming a single op. Similarly, we now know that all callers of rbd_do_request() pass 1 as the num_op value, so that

[PATCH REPOST] rbd: kill ceph_osd_req_op-flags

2013-01-04 Thread Alex Elder
The flags field of struct ceph_osd_req_op is never used, so just get rid of it. Signed-off-by: Alex Elder el...@inktank.com --- include/linux/ceph/osd_client.h |1 - 1 file changed, 1 deletion(-) diff --git a/include/linux/ceph/osd_client.h b/include/linux/ceph/osd_client.h index

[PATCH REPOST 0/3] rbd: no need for file mapping calculation

2013-01-04 Thread Alex Elder
Currently every osd request submitted by the rbd code undergoes a file mapping operation, which is common with what the ceph file system uses. But some analysis shows that there is no need to do this for rbd, because it already takes care of its own blocking of image data into distinct objects.

[PATCH REPOST 1/3] rbd: pull in ceph_calc_raw_layout()

2013-01-04 Thread Alex Elder
This is the first in a series of patches aimed at eliminating the use of ceph_calc_raw_layout() by rbd. It simply pulls in a copy of that function and renames it rbd_calc_raw_layout(). Signed-off-by: Alex Elder el...@inktank.com --- drivers/block/rbd.c | 36

[PATCH REPOST 2/3] rbd: open code rbd_calc_raw_layout()

2013-01-04 Thread Alex Elder
This patch gets rid of rbd_calc_raw_layout() by simply open coding it in its one caller. Signed-off-by: Alex Elder el...@inktank.com --- drivers/block/rbd.c | 55 +-- 1 file changed, 18 insertions(+), 37 deletions(-) diff --git

[PATCH REPOST 3/3] rbd: don't bother calculating file mapping

2013-01-04 Thread Alex Elder
When rbd_do_request() has a request to process it initializes a ceph file layout structure and uses it to compute offsets and limits for the range of the request using ceph_calc_file_object_mapping(). The layout used is fixed, and is based on RBD_MAX_OBJ_ORDER (30). It sets the layout's object

[PATCH REPOST] rbd: use a common layout for each device

2013-01-04 Thread Alex Elder
Each osd message includes a layout structure, and for rbd it is always the same (at least for osd's in a given pool). Initialize a layout structure when an rbd_dev gets created and just copy that into osd requests for the rbd image. Replace an assertion that was done when initializing the layout

[PATCH REPOST] rbd: combine rbd sync watch/unwatch functions

2013-01-04 Thread Alex Elder
The rbd_req_sync_watch() and rbd_req_sync_unwatch() functions are nearly identical. Combine them into a single function with a flag indicating whether a watch is to be initiated or torn down. Signed-off-by: Alex Elder el...@inktank.com --- drivers/block/rbd.c | 81

[PATCH REPOST 0/2] rbd: fix two leaks

2013-01-04 Thread Alex Elder
When certain special osd requests are processed, they have data structures allocated that are not properly freed when the request is completed. This series fixes that. -Alex [PATCH REPOST 1/2] rbd: don't leak rbd_req on synchronous requests [PATCH REPOST

[PATCH REPOST 1/2] rbd: don't leak rbd_req on synchronous requests

2013-01-04 Thread Alex Elder
When rbd_do_request() is called it allocates and populates an rbd_req structure to hold information about the osd request to be sent. This is done for the benefit of the callback function (in particular, rbd_req_cb()), which uses this in processing when the request completes. Synchronous

[PATCH REPOST 2/2] rbd: don't leak rbd_req for rbd_req_sync_notify_ack()

2013-01-04 Thread Alex Elder
When rbd_req_sync_notify_ack() calls rbd_do_request() it supplies rbd_simple_req_cb() as its callback function. Because the callback is supplied, an rbd_req structure gets allocated and populated so it can be used by the callback. However rbd_simple_req_cb() is not freeing (or even using) the

[PATCH REPOST 0/6] rbd: consolidate osd request setup

2013-01-04 Thread Alex Elder
This series consolidates and encapsulates the setup of all osd requests into a single function which takes variable arguments appropriate for the type of request. The result groups together common code idioms and I think makes the spots that build these messages a little easier to read.

[PATCH REPOST 1/6] rbd: don't assign extent info in rbd_do_request()

2013-01-04 Thread Alex Elder
In rbd_do_request() there's a sort of last-minute assignment of the extent offset and length and payload length for read and write operations. Move those assignments into the caller (in those spots that might initiate read or write operations) Signed-off-by: Alex Elder el...@inktank.com ---

[PATCH REPOST 2/6] rbd: don't assign extent info in rbd_req_sync_op()

2013-01-04 Thread Alex Elder
Move the assignment of the extent offset and length and payload length out of rbd_req_sync_op() and into its caller in the one spot where a read (and note--no write) operation might be initiated. Signed-off-by: Alex Elder el...@inktank.com --- drivers/block/rbd.c | 10 +++--- 1 file

[PATCH REPOST 3/6] rbd: initialize off and len in rbd_create_rw_op()

2013-01-04 Thread Alex Elder
Move the initialization of a read or write operation's offset, length, and payload length fields into rbd_create_rw_op(). This will actually get removed in the next patch, but it finishes the consolidation of setting these fields at osd op creation time. Signed-off-by: Alex Elder

[PATCH REPOST 4/6] rbd: define generalized osd request op routines

2013-01-04 Thread Alex Elder
Create a baseline function to encapsulate the creation of osd requests, along with a matching function to destroy them. For now this just duplicates what rbd_create_rw_op() does for read and write operations, but the next patches will expand on this. Since rbd_create_rw_op() is no longer used

[PATCH REPOST 5/6] rbd: move call osd op setup into rbd_osd_req_op_create()

2013-01-04 Thread Alex Elder
Move the initialization of the CEPH_OSD_OP_CALL operation into rbd_osd_req_op_create(). Signed-off-by: Alex Elder el...@inktank.com --- drivers/block/rbd.c | 48 ++-- 1 file changed, 30 insertions(+), 18 deletions(-) diff --git a/drivers/block/rbd.c

[PATCH REPOST 6/6] rbd: move remaining osd op setup into rbd_osd_req_op_create()

2013-01-04 Thread Alex Elder
The two remaining osd ops used by rbd are CEPH_OSD_OP_WATCH and CEPH_OSD_OP_NOTIFY_ACK. Move the setup of those operations into rbd_osd_req_op_create(), and get rid of rbd_create_rw_op() and rbd_destroy_op(). Signed-off-by: Alex Elder el...@inktank.com --- drivers/block/rbd.c | 68

[PATCH REPOST] rbd: assign watch request more directly

2013-01-04 Thread Alex Elder
Both rbd_req_sync_op() and rbd_do_request() have a linger parameter, which is the address of a pointer that should refer to the osd request structure used to issue a request to an osd. Only one case ever supplies a non-null linger argument: an CEPH_OSD_OP_WATCH start. And in that one case it is

THE END, for now

2013-01-04 Thread Alex Elder
That concludes my reposting of un-reviewed patches. I see I'm getting some reviews now, so I'll be building up a replacement testing branch with annotations to the commits indicating that. -Alex -- To unsubscribe from this list: send the line unsubscribe

Re: OSD memory leaks?

2013-01-04 Thread Sébastien Han
Hi Sam, Thanks for your answer and sorry the late reply. Unfortunately I can't get something out from the profiler, actually I do but I guess it doesn't show what is supposed to show... I will keep on trying this. Anyway yesterday I just thought that the problem might be due to some over usage

Re: [PATCH REPOST 2/4] rbd: add warning messages for missing arguments

2013-01-04 Thread Alex Elder
On 01/03/2013 07:10 PM, Dan Mick wrote: Do you want to include in the message some kind of indication which operation/function is involved? (this is definitely better, but even better might be to add rbd add or rbd_add_parse_args to the msgs) Sure. This comment really applies to the previous

[PATCH, v2] rbd: define and use rbd_warn()

2013-01-04 Thread Alex Elder
Define a new function rbd_warn() that produces a boilerplate warning message, identifying in the resulting message the affected rbd device in the best way available. Use it in a few places that now use pr_warning(). Signed-off-by: Alex Elder el...@inktank.com Reviewed-by: Dan Mick

[PATCH 4/6] ceph.spec.in: fix libcephfs-jni package name

2013-01-04 Thread Danny Al-Gaaf
Rename libcephfs-jni to libcephfs_jni1 to reflect the SO name/version of the library and to prevent RPMLINT to complain about the naming. Signed-off-by: Danny Al-Gaaf danny.al-g...@bisect.de --- ceph.spec.in | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/ceph.spec.in

[PATCH 6/6] configure.ac: change junit4 handling

2013-01-04 Thread Danny Al-Gaaf
Change handling of --with-debug and junit4. Add a new conditional HAVE_JUNIT4 to be able to build ceph-test package also if junit4 isn't available. In this case simply don't build libcephfs-test.jar, but the rest of the tools. Signed-off-by: Danny Al-Gaaf danny.al-g...@bisect.de --- configure.ac

[PATCH 3/6] ceph.spec.in: rename libcephfs-java package to cephfs-java

2013-01-04 Thread Danny Al-Gaaf
Rename the libcephfs-java package to cephfs-java since the package contains no (classic) library and RPMLINT complains about the name. Signed-off-by: Danny Al-Gaaf danny.al-g...@bisect.de --- ceph.spec.in | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/ceph.spec.in

[PATCH 5/6] configure.ac: remove AC_PROG_RANLIB

2013-01-04 Thread Danny Al-Gaaf
Remove already comment out AC_PROG_RANLIB to get rid of warning: libtoolize: `AC_PROG_RANLIB' is rendered obsolete by `LT_INIT' Signed-off-by: Danny Al-Gaaf danny.al-g...@bisect.de --- configure.ac | 1 - 1 file changed, 1 deletion(-) diff --git a/configure.ac b/configure.ac index

[PATCH 1/6] src/java/Makefile.am: fix default java dir

2013-01-04 Thread Danny Al-Gaaf
Fix default javadir in src/java/Makefile.am to $(datadir)/java since this is the common data dir for java files. Signed-off-by: Danny Al-Gaaf danny.al-g...@bisect.de --- src/java/Makefile.am | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/java/Makefile.am

[PATCH 0/6] fix build and packaging issues

2013-01-04 Thread Danny Al-Gaaf
This set of patches contains fixes for some build and packaging issues. Danny Al-Gaaf (6): src/java/Makefile.am: fix default java dir ceph.spec.in: fix handling of java files ceph.spec.in: rename libcephfs-java package to cephfs-java ceph.spec.in: fix libcephfs-jni package name

Re: [PATCH 0/2] Librados aio stat

2013-01-04 Thread Sage Weil
Sorry, I missed this second set of patches when they came through over the break. We'll try to get them merged into master shortly. Thanks! sage On Fri, 4 Jan 2013, Filippos Giannakos wrote: Hi Team, Is there any progress or any comments regarding the librados aio stat patch ? Best

Re: [PATCH 0/6] fix build and packaging issues

2013-01-04 Thread Gregory Farnum
Thanks! Gary, can you pull these into a branch and do some before-and-after package comparisons on our systems (for the different distros in gitbuilder) and then merge into master? -Greg On Fri, Jan 4, 2013 at 9:51 AM, Danny Al-Gaaf danny.al-g...@bisect.de wrote: This set of patches contains

Re: [PATCH 0/6] fix build and packaging issues

2013-01-04 Thread Gary Lowell
No Problem. Cheers, Gary On Jan 4, 2013, at 10:00 AM, Gregory Farnum wrote: Thanks! Gary, can you pull these into a branch and do some before-and-after package comparisons on our systems (for the different distros in gitbuilder) and then merge into master? -Greg On Fri, Jan 4, 2013 at

Re: [PATCH 0/2] Librados aio stat

2013-01-04 Thread Josh Durgin
On 01/04/2013 05:01 AM, Filippos Giannakos wrote: Hi Team, Is there any progress or any comments regarding the librados aio stat patch ? They look good to me. I put them in the wip-librados-aio-stat branch. Can we add your signed-off-by to them? Thanks, Josh Best regards On 12/20/2012

Re: Usage of CEPH FS versa HDFS for Hadoop: TeraSort benchmark performance comparison issue

2013-01-04 Thread Gregory Farnum
Sorry for the delay; I've been out on vacation... On Fri, Dec 14, 2012 at 6:09 AM, Lachfeld, Jutta jutta.lachf...@ts.fujitsu.com wrote: I do not have the full output of ceph pg dump for that specific TeraSort run, but here is a typical output after automatically preparing CEPH for a

Re: which Linux kernel version corresponds to 0.48argonaut?

2013-01-04 Thread Gregory Farnum
I think they might be different just as a consequence of being updated less recently; that's where all of the lines whose origin I recognize differ (not certain about the calc_parents stuff though). Sage can confirm. The specific issue you encountered previously was of course because you changed

Re: which Linux kernel version corresponds to 0.48argonaut?

2013-01-04 Thread Sage Weil
On Fri, 4 Jan 2013, Gregory Farnum wrote: I think they might be different just as a consequence of being updated less recently; that's where all of the lines whose origin I recognize differ (not certain about the calc_parents stuff though). Sage can confirm. In general, the crush files in

Re: ceph stability

2013-01-04 Thread Gregory Farnum
On Fri, Dec 21, 2012 at 2:07 AM, Amon Ott a...@m-privacy.de wrote: Am 20.12.2012 15:31, schrieb Mark Nelson: On 12/20/2012 01:08 AM, Roman Hlynovskiy wrote: Hello Mark, for multi-mds solutions do you refer to multi-active arch or 1 active and many standby arch? That's a good question! I

Re: Any idea about doing deduplication in ceph?

2013-01-04 Thread Gregory Farnum
On Wed, Dec 26, 2012 at 6:16 PM, lollipop lollipop_...@126.com wrote: Nowadays, I am wondering doing offline deduplication in ceph? My idea is: First in the ceph-client, I try to get the locations of chunks in one file. The information includes how many chunks the file has and which osd the

ceph caps (Ganesha + Ceph pnfs)

2013-01-04 Thread Matt W. Benjamin
Hi Ceph folks, Summarizing from Ceph IRC discussion by request, I'm one of the developers of a pNFS (parallel nfs) implementation that is built atop the Ceph system. I'm working on code that wants to use the Ceph caps system to control and sequence i/o operations and file metadata, for

argonaut stable update coming shortly

2013-01-04 Thread Sage Weil
We recently identified a bug in the way the OSD is marking its commits on XFS or ext4 (well, anything not btrfs) that could lead data loss in the event of power loss or a kernel panic. We're doing some final testing and will have a v0.48.3argonaut package available by Monday. If you like,

librados/librbd compatibility issue with v0.56

2013-01-04 Thread Sage Weil
We identified a problem with the version of librbd/librados in v0.56. There will be a v0.56.1 update in a few days that fixes it. In the meantime, be aware that v0.56 ceph-osds may not interact properly with non-v0.56 radosgw or librbd clients, and v0.56 radosgw and librbd clients will not

Re: [PATCH REPOST] ceph: define ceph_encode_8_safe()

2013-01-04 Thread Dan Mick
Reviewed-by: Dan Mick dan.m...@inktank.com On 01/03/2013 11:07 AM, Alex Elder wrote: It's kind of a silly macro, but ceph_encode_8_safe() is the only one missing from an otherwise pretty complete set. It's not used, but neither are a couple of the others in this set. While in there, insert

Re: [PATCH REPOST] rbd: be picky about osd request status type

2013-01-04 Thread Dan Mick
I personally dislike spaces after cast, but I haven't checked the kernel style guide. Otherwise: Reviewed-by: Dan Mick dan.m...@inktank.com On 01/03/2013 02:40 PM, Alex Elder wrote: The result field in a ceph osd reply header is a signed 32-bit type, but rbd code often casually uses int to