[PATCH 6/6] libceph: verify requests queued in order

2013-03-25 Thread Alex Elder
Add checking to verify all osd requests for an osd are added to the queue in order of increasing tid. Signed-off-by: Alex Elder --- net/ceph/osd_client.c | 38 ++ 1 file changed, 38 insertions(+) diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c in

[PATCH 5/6] libceph: send queued requests when starting new one

2013-03-25 Thread Alex Elder
An osd expects the transaction ids of arriving request messages from a given client to increase monotonically. So the osd client needs to send its requests in ascending tid order. The transaction id for a request is set at the time it is registered, in __register_request(). This is also where th

[PATCH 4/6] libceph: keep request lists in tid order

2013-03-25 Thread Alex Elder
In __map_request(), when adding a request to an osd client's unsent list, add it to the tail rather than the head. That way the newest entries (with the highest tid value) will be last. Maintain an osd's request list in order of increasing tid also. Finally--to be consistent--maintain an osd cli

[PATCH 3/6] libceph: prepend requests in order when kicking

2013-03-25 Thread Alex Elder
The osd expects incoming requests from a given client to arrive in order, with the tid for each request being greater than the tid for requests that have already arrived. This patch fixes one place the osd client might not maintain that ordering. For the osd client, the connection fault method is

[PATCH 2/6] libceph: no more kick_requests() race

2013-03-25 Thread Alex Elder
Since we no longer drop the request mutex between registering and mapping an osd request in ceph_osdc_start_request(), there is no chance of a race with kick_requests(). We can now therefore map and send the new request unconditionally (but we'll issue a warning should it ever occur). Signed-off-

[PATCH 1/6] libceph: slightly defer registering osd request

2013-03-25 Thread Alex Elder
One of the first things ceph_osdc_start_request() does is register the request. It then acquires the osd client's map semaphore and request mutex and proceeds to map and send the request. There is no reason the request has to be registered before acquiring the map semaphore. So hold off doing so

[PATCH 0/6] libceph: send osd requests in tid order

2013-03-25 Thread Alex Elder
This series rearranges the way osd requests are placed onto an osd client's unsent request list so that they are kept in order based on their transaction ids. The osd expects its requests from a client to have monotonically increasing tids. Since requests are sent to the osd in the order they are

RE: [ceph-users] Ceph Crach at sync_thread_timeout after heavy random writes.

2013-03-25 Thread Chen, Xiaoxi
Hi Sage, Thanks for your mail.When turn on filestore sync flush, it seems works and OSD process doesn't suicide any more . I have already disabled flusher long age since both Mark's and my report show disable flusher seems to improve performance(so my original configuration is filestore_

Re: corruption of active mmapped files in btrfs snapshots

2013-03-25 Thread Chris Mason
Quoting Chris Mason (2013-03-22 16:31:42) > Going through the code here, when I change the test to truncate once in > the very beginning, I still get errors. So, it isn't an interaction > between mmap and truncate. It must be a problem between lzo and mmap. With compression off, we use clear_pag

prototype incremental rbd backup

2013-03-25 Thread Sage Weil
The wip-rbd-diff branch has an early prototype of rbd incremental backup/restore. Currently it works like: rbd export-diff myimage@to --from-snap from - | \ rbd import-diff - myimagecopy The import apply the update stream and then create the final 'to' snap (which is referenced in the strea

Re: v0.56.4 released

2013-03-25 Thread Sage Weil
On Mon, 25 Mar 2013, Sage Weil wrote: > There is one minor change (fix) in the output to the 'ceph osd tree > --format=json' command. Please see the full release notes. Greg just reminded me about one additional note about upgrades (that should hopefully affect noone): * The MDS disk format has

v0.56.4 released

2013-03-25 Thread Sage Weil
There have been several important fixes that we've backported to bobtail that users are hitting in the wild. Most notably, there was a problem with pool names with - and _ that OpenStack users were hitting, and memory usage by ceph-osd and other daemons due to the trimming of in-memory logs. Th

Re: crush changes via cli

2013-03-25 Thread Sage Weil
On Sun, 24 Mar 2013, Gregory Farnum wrote: > On Sun, Mar 24, 2013 at 5:04 PM, Sage Weil wrote: > > On Sun, 24 Mar 2013, Gregory Farnum wrote: > >> On Fri, Mar 22, 2013 at 3:58 PM, Sage Weil >> (mailto:s...@inktank.com)> wrote: > >> > On Fri, 22 Mar 2013, Gregory Farnum wrote: > >> > > I suspect u

Re: Set object mtime

2013-03-25 Thread Sage Weil
On Mon, 25 Mar 2013, Damien Churchill wrote: > On 25 March 2013 19:18, Wido den Hollander wrote: > > On 03/25/2013 12:27 PM, Damien Churchill wrote: > >> > >> Does anyone know if it's possible to set an objects mtime at all in RADOS? > >> > > > > You mean with a specific operation while not modify

Re: Set object mtime

2013-03-25 Thread Damien Churchill
On 25 March 2013 19:18, Wido den Hollander wrote: > On 03/25/2013 12:27 PM, Damien Churchill wrote: >> >> Does anyone know if it's possible to set an objects mtime at all in RADOS? >> > > You mean with a specific operation while not modifying the content of the > object? > > I checked the rados AP

Re: Set object mtime

2013-03-25 Thread Wido den Hollander
On 03/25/2013 12:27 PM, Damien Churchill wrote: Does anyone know if it's possible to set an objects mtime at all in RADOS? You mean with a specific operation while not modifying the content of the object? I checked the rados API, but I couldn't find any method which allows you to do so.

Re: crush changes via cli

2013-03-25 Thread Dan Mick
That's right. The "remove" versus "unlink" verbs make that pretty clear to me, at least... Are you suggesting this be clarified in the docs, or that the command set change? I think once we settle on the CLI, John can make a pass through the crush docs and make sure these commands are explained.

[PATCH] libceph: initialize data fields on last msg put

2013-03-25 Thread Alex Elder
(This patch is available in branch "review/wip-4540" of the ceph-client git repository, which is based on the current "testing" branch.) When the last reference to a ceph message is dropped, ceph_msg_last_put() is called to clean things up. For "normal" messages (allocated via ceph_msg_new() rath

Re: [PATCH 5/6] libceph: wrap auth methods in a mutex

2013-03-25 Thread Alex Elder
On 03/25/2013 11:26 AM, Sage Weil wrote: >> > And ceph_auth_is_authenticated() here... > These *are* the wrappers :) D'oh! Good thing I quit commenting on those then. -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org Mo

Re: [PATCH 6/6] libceph: verify authorizer reply

2013-03-25 Thread Sage Weil
On Mon, 25 Mar 2013, Alex Elder wrote: > On 03/19/2013 06:08 PM, Sage Weil wrote: > > The 'cephx' auth protocol provides mutual authenticate for client and > > server. However, as the client, we were not verifying that the server > > auth reply was in fact authentic. Although the infrastructure t

Re: [PATCH 5/6] libceph: wrap auth methods in a mutex

2013-03-25 Thread Sage Weil
On Mon, 25 Mar 2013, Alex Elder wrote: > On 03/19/2013 06:08 PM, Sage Weil wrote: > > The auth code is called from a variety of contexts, include the mon_client > > (protected by the monc's mutex) and the messenger callbacks (currently > > protected by nothing). Avoid chaos by protecting all auth

Re: [PATCH 22/39] mds: handle linkage mismatch during cache rejoin

2013-03-25 Thread Gregory Farnum
On Thu, Mar 21, 2013 at 8:05 PM, Yan, Zheng wrote: > On 03/22/2013 05:23 AM, Gregory Farnum wrote: >> On Sun, Mar 17, 2013 at 7:51 AM, Yan, Zheng wrote: >>> From: "Yan, Zheng" >>> >>> For MDS cluster, not all file system namespace operations that impact >>> multiple MDS use two phase commit. Som

Re: [PATCH 3/6] libceph: add update_authorizer auth method

2013-03-25 Thread Sage Weil
On Mon, 25 Mar 2013, Alex Elder wrote: > On 03/19/2013 06:08 PM, Sage Weil wrote: > > Currently the messenger calls out to a get_authorizer con op, which will > > create a new authorizer if it doesn't yet have one. In the meantime, when > > we rotate our service keys, the authorizer doesn't get up

Re: [PATCH 2/6] libceph: fix authorizer invalidation

2013-03-25 Thread Sage Weil
On Mon, 25 Mar 2013, Alex Elder wrote: > On 03/19/2013 06:08 PM, Sage Weil wrote: > > We were invalidating the authorizer by removing the ticket handler > > entirely. This was effective in inducing us to request a new authorizer, > > but in the meantime it mean that any authorizer we generated wou

Re: [PATCH] libceph: implement RECONNECT_SEQ feature

2013-03-25 Thread Sage Weil
On Mon, 25 Mar 2013, Alex Elder wrote: > On 03/19/2013 05:48 PM, Sage Weil wrote: > > This is an old protocol extension that allows the client and server to > > avoid resending old messages after a reconnect (following a socket error). > > Instead, the exchange their sequence numbers during the han

Re: [ceph-users] Ceph Crach at sync_thread_timeout after heavy random writes.

2013-03-25 Thread Sage Weil
Hi Xiaoxi, On Mon, 25 Mar 2013, Chen, Xiaoxi wrote: > From Ceph-w , ceph reports a very high Ops (1+ /s) , but > technically , 80 spindles can provide up to 150*80/2=6000 IOPS for 4K random > write. > > When digging into the code, I found that the OSD write data to > Pagecac

Re: corruption of active mmapped files in btrfs snapshots

2013-03-25 Thread David Sterba
On Sat, Mar 23, 2013 at 06:48:38AM -0300, Alexandre Oliva wrote: > On Mar 22, 2013, David Sterba wrote: > > > I've reproduced this without compression, with autodefrag on. > > I don't have autodefrag on, unless it's enabled by default on 3.8.3 or > on the for-linus tree. It's not on by default,

Re: [PATCH 6/6] libceph: verify authorizer reply

2013-03-25 Thread Alex Elder
On 03/19/2013 06:08 PM, Sage Weil wrote: > The 'cephx' auth protocol provides mutual authenticate for client and > server. However, as the client, we were not verifying that the server > auth reply was in fact authentic. Although the infrastructure to do so was > all in place, we neglected to act

Re: [PATCH 5/6] libceph: wrap auth methods in a mutex

2013-03-25 Thread Alex Elder
On 03/19/2013 06:08 PM, Sage Weil wrote: > The auth code is called from a variety of contexts, include the mon_client > (protected by the monc's mutex) and the messenger callbacks (currently > protected by nothing). Avoid chaos by protecting all auth state with a > mutex. Nothing is blocking, so

Re: [PATCH 4/6] libceph: wrap auth ops in wrapper functions

2013-03-25 Thread Alex Elder
On 03/19/2013 06:08 PM, Sage Weil wrote: > Use wrapper functions that check whether the auth op exists so that callers > do not need a bunch of conditional checks. Simplifies the external > interface. > > Signed-off-by: Sage Weil You know I like this... It kind of modifies one or two earlier s

Re: [PATCH 3/6] libceph: add update_authorizer auth method

2013-03-25 Thread Alex Elder
On 03/19/2013 06:08 PM, Sage Weil wrote: > Currently the messenger calls out to a get_authorizer con op, which will > create a new authorizer if it doesn't yet have one. In the meantime, when > we rotate our service keys, the authorizer doesn't get updated. Eventually > it will be rejected by the

RE: [ceph-users] Ceph Crach at sync_thread_timeout after heavy random writes.

2013-03-25 Thread Chen, Xiaoxi
Rephrase it to make it more clear From: ceph-users-boun...@lists.ceph.com [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Chen, Xiaoxi Sent: 2013年3月25日 17:02 To: 'ceph-us...@lists.ceph.com' (ceph-us...@lists.ceph.com) Cc: ceph-devel@vger.kernel.org Subject: [ceph-users] Ceph Crach at sync

Re: [PATCH 2/6] libceph: fix authorizer invalidation

2013-03-25 Thread Alex Elder
On 03/19/2013 06:08 PM, Sage Weil wrote: > We were invalidating the authorizer by removing the ticket handler > entirely. This was effective in inducing us to request a new authorizer, > but in the meantime it mean that any authorizer we generated would get a > new and initialized handler with sec

Re: [PATCH 1/6] libceph: clear messenger auth_retry flag when we authenticate

2013-03-25 Thread Alex Elder
On 03/19/2013 06:08 PM, Sage Weil wrote: > We maintain a counter of failed auth attempts to allow us to retry once > before failing. However, if the second attempt succeeds, the flag isn't > cleared, which makes us think auth failed again later when the connection > resets for other reasons (like

Re: [PATCH] libceph: implement RECONNECT_SEQ feature

2013-03-25 Thread Alex Elder
On 03/19/2013 05:48 PM, Sage Weil wrote: > This is an old protocol extension that allows the client and server to > avoid resending old messages after a reconnect (following a socket error). > Instead, the exchange their sequence numbers during the handshake. This > avoids sending a bunch of usele

Re: [PATCH 04/39] mds: make sure table request id unique

2013-03-25 Thread Yan, Zheng
On 03/22/2013 06:03 AM, Gregory Farnum wrote: > Right. I'd like to somehow mark those reqid's so that we can tell when > they come from a different incarnation of the MDS TableClient daemon. > One way is via some piece of random data that will probably > distinguish them, although if we have someth

Set object mtime

2013-03-25 Thread Damien Churchill
Does anyone know if it's possible to set an objects mtime at all in RADOS? Regards, Damien -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html