Re: [PATCH v1 06/11] locks: convert to i_lock to protect i_flock list

2013-06-04 Thread Jeff Layton
On Tue, 4 Jun 2013 17:22:08 -0400 "J. Bruce Fields" wrote: > On Fri, May 31, 2013 at 11:07:29PM -0400, Jeff Layton wrote: > > Having a global lock that protects all of this code is a clear > > scalability problem. Instead of doing that, move most of the code to be > > protected by the i_lock inst

Re: [PATCH v1 08/11] locks: convert fl_link to a hlist_node

2013-06-04 Thread J. Bruce Fields
On Fri, May 31, 2013 at 11:07:31PM -0400, Jeff Layton wrote: > Testing has shown that iterating over the blocked_list for deadlock > detection turns out to be a bottleneck. In order to alleviate that, > begin the process of turning it into a hashtable. We start by turning > the fl_link into a hlist

Re: [PATCH v1 07/11] locks: only pull entries off of blocked_list when they are really unblocked

2013-06-04 Thread J. Bruce Fields
On Fri, May 31, 2013 at 11:07:30PM -0400, Jeff Layton wrote: > Currently, when there is a lot of lock contention the kernel spends an > inordinate amount of time taking blocked locks off of the global > blocked_list and then putting them right back on again. When all of this > code was protected by

Re: [PATCH v1 06/11] locks: convert to i_lock to protect i_flock list

2013-06-04 Thread J. Bruce Fields
On Fri, May 31, 2013 at 11:07:29PM -0400, Jeff Layton wrote: > Having a global lock that protects all of this code is a clear > scalability problem. Instead of doing that, move most of the code to be > protected by the i_lock instead. > > The exceptions are the global lists that file_lock->fl_link

Re: PGLog::merge_log clarification

2013-06-04 Thread Samuel Just
Within a single log, the version must be always increasing. The reason for the epoch is that you might have two logs: a b (1, 1) (1, 1) (1, 2) (1, 2) (1, 3) (2, 3) (2, 4) If a dies after persisting (1, 3) but before sending it to b, b might become primary and go on accep

PGLog::merge_log clarification

2013-06-04 Thread Loic Dachary
Hi Sam, When calling merge_log on the following : http://pastealacon.com/32449 ( where the first column are the versions of the log entries and the second column are the versions of the olog entries ) I expect 5,2 and 6,1 from log to be added to divergent https://github.com/ceph/ceph/blob/maste

Re: [PATCH v1 04/11] locks: make "added" in __posix_lock_file a bool

2013-06-04 Thread J. Bruce Fields
On Fri, May 31, 2013 at 11:07:27PM -0400, Jeff Layton wrote: > ...save 3 bytes of stack space. > > Signed-off-by: Jeff Layton ACK.--b. > --- > fs/locks.c |9 + > 1 files changed, 5 insertions(+), 4 deletions(-) > > diff --git a/fs/locks.c b/fs/locks.c > index a7d2253..cef0e04 1006

Re: [PATCH v1 05/11] locks: encapsulate the fl_link list handling

2013-06-04 Thread J. Bruce Fields
On Fri, May 31, 2013 at 11:07:28PM -0400, Jeff Layton wrote: > Move the fl_link list handling routines into a separate set of helpers. > Also move the global list handling out of locks_insert_block, and into > the caller that ends up triggering it as that allows us to eliminate the > IS_POSIX check

Re: flatten rbd export / export-diff ?

2013-06-04 Thread Josh Durgin
On 06/04/2013 11:04 AM, Stefan Priebe wrote: Am 04.06.2013 17:23, schrieb Sage Weil: On Tue, 4 Jun 2013, Stefan Priebe - Profihost AG wrote: Hi, is there a way to flatten the rbd export-diff to a "new" image FILE. Or do i always have to: rbd import "OLD BASE IMAGE" rbd import-diff diff1 rbd i

Re: flatten rbd export / export-diff ?

2013-06-04 Thread Stefan Priebe
Am 04.06.2013 17:23, schrieb Sage Weil: On Tue, 4 Jun 2013, Stefan Priebe - Profihost AG wrote: Hi, is there a way to flatten the rbd export-diff to a "new" image FILE. Or do i always have to: rbd import "OLD BASE IMAGE" rbd import-diff diff1 rbd import-diff diff1-2 rbd import-diff diff2-3 rbd

Re: RGW and Keystone

2013-06-04 Thread Chmouel Boudjnah
Hello Yehuda, Sorry this was actually directed to you (florian actually told me you were the go to person for row). I am not totally familiar with s3 how does a single namespace ensure a accounts/user don't have access to the resource of the others ? Glad to know you are tackling the multi-te

Re: Operation per second meanining

2013-06-04 Thread Gregory Farnum
On Tue, Jun 4, 2013 at 3:26 AM, Roman Alekseev wrote: > Hello, > > Please help me to understand op/s (operation per second) value in 'pgmap > v71520: 3352 pgs: 3352 active+clean; 212 GB data, 429 GB used, 23444 GB / > 23874 GB avail; 89237KB/s wr, 24op/s' line? > What does "operation" mean? What

Re: [PATCH v1 11/11] locks: give the blocked_hash its own spinlock

2013-06-04 Thread Christoph Hellwig
Having RCU for modification mostly workloads never is a good idea, so I don't think it makes sense to mention it here. If you care about the overhead it's worth trying to use per-cpu lists, though. -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message t

Re: flatten rbd export / export-diff ?

2013-06-04 Thread Sage Weil
On Tue, 4 Jun 2013, Stefan Priebe - Profihost AG wrote: > Hi, > > is there a way to flatten the rbd export-diff to a "new" image FILE. Or > do i always have to: > > rbd import "OLD BASE IMAGE" > rbd import-diff diff1 > rbd import-diff diff1-2 > rbd import-diff diff2-3 > rbd import-diff diff3-4 >

Re: [PATCH v1 11/11] locks: give the blocked_hash its own spinlock

2013-06-04 Thread Jeff Layton
On Tue, 4 Jun 2013 10:53:22 -0400 "J. Bruce Fields" wrote: > On Tue, Jun 04, 2013 at 07:46:40AM -0700, Christoph Hellwig wrote: > > Having RCU for modification mostly workloads never is a good idea, so > > I don't think it makes sense to mention it here. > > > > If you care about the overhead it

Re: RGW and Keystone

2013-06-04 Thread Yehuda Sadeh
(resending due to formatting) I'm not Florian, but I think I can help here. The radosgw user-tenant model is (currently) different from the swift one. It's more like the S3 model, where users live in a single namespace. So the current Swift user mapping is not perfect. We created 'subusers' to em

Re: [PATCH v1 11/11] locks: give the blocked_hash its own spinlock

2013-06-04 Thread Jeff Layton
On Tue, 4 Jun 2013 07:46:40 -0700 Christoph Hellwig wrote: > Having RCU for modification mostly workloads never is a good idea, so > I don't think it makes sense to mention it here. > > If you care about the overhead it's worth trying to use per-cpu lists, > though. > Yeah, I looked at those t

Re: [PATCH v1 11/11] locks: give the blocked_hash its own spinlock

2013-06-04 Thread J. Bruce Fields
On Tue, Jun 04, 2013 at 07:46:40AM -0700, Christoph Hellwig wrote: > Having RCU for modification mostly workloads never is a good idea, so > I don't think it makes sense to mention it here. > > If you care about the overhead it's worth trying to use per-cpu lists, > though. Yes. The lock and unl

Re: [PATCH v1 11/11] locks: give the blocked_hash its own spinlock

2013-06-04 Thread Jeff Layton
On Tue, 04 Jun 2013 16:19:53 +0200 "Stefan (metze) Metzmacher" wrote: > Hi Jeff, > > > There's no reason we have to protect the blocked_hash and file_lock_list > > with the same spinlock. With the tests I have, breaking it in two gives > > a barely measurable performance benefit, but it seems re

Re: [PATCH v1 11/11] locks: give the blocked_hash its own spinlock

2013-06-04 Thread Stefan (metze) Metzmacher
Hi Jeff, > There's no reason we have to protect the blocked_hash and file_lock_list > with the same spinlock. With the tests I have, breaking it in two gives > a barely measurable performance benefit, but it seems reasonable to make > this locking as granular as possible. as file_lock_{list,lock}

flatten rbd export / export-diff ?

2013-06-04 Thread Stefan Priebe - Profihost AG
Hi, is there a way to flatten the rbd export-diff to a "new" image FILE. Or do i always have to: rbd import "OLD BASE IMAGE" rbd import-diff diff1 rbd import-diff diff1-2 rbd import-diff diff2-3 rbd import-diff diff3-4 rbd import-diff diff4-5 ... and so on? I would like to apply the diffs on loc

Re: libcephfs: Open-By-Handle API question

2013-06-04 Thread Matt W. Benjamin
Hi Ilya, The changes on this branch originated in our Ganesha NFS driver for Ceph, so I'm not sure where the gap is, if any. I'll send an update to the list when we've finish re-integrating against the libcephfs-wip merge branch. Matt - "Ilya Storozhilov" wrote: > Hi Ceph developers, > >

Re: [PATCH v1 00/11] locks: scalability improvements for file locking

2013-06-04 Thread Jeff Layton
On Tue, 4 Jun 2013 07:56:44 -0400 Jim Rees wrote: > Jeff Layton wrote: > > > Might be nice to look at some profiles to confirm all of that. I'd also > > be curious how much variation there was in the results above, as they're > > pretty close. > > > > The above is just a random re

Re: rationale for a PGLog::merge_old_entry case

2013-06-04 Thread Loic Dachary
Hi Sam, Thanks for the explanation. I misread the third case, my description was incorrect. I amended https://github.com/ceph/ceph/pull/340 accordingly. Cheers On 06/03/2013 10:28 PM, Samuel Just wrote: > In all three cases, we know the authoritative log does not contain an > entry for oe.soid,

Re: [PATCH v1 00/11] locks: scalability improvements for file locking

2013-06-04 Thread Jim Rees
Jeff Layton wrote: > Might be nice to look at some profiles to confirm all of that. I'd also > be curious how much variation there was in the results above, as they're > pretty close. > The above is just a random representative sample. The results are pretty close when running thi

Re: [PATCH v1 03/11] locks: comment cleanups and clarifications

2013-06-04 Thread Jeff Layton
On Mon, 3 Jun 2013 18:00:24 -0400 "J. Bruce Fields" wrote: > On Fri, May 31, 2013 at 11:07:26PM -0400, Jeff Layton wrote: > > Signed-off-by: Jeff Layton > > --- > > fs/locks.c | 24 +++- > > include/linux/fs.h |6 ++ > > 2 files changed, 25 insertions(+), 5

Re: [PATCH v1 00/11] locks: scalability improvements for file locking

2013-06-04 Thread Jeff Layton
On Mon, 3 Jun 2013 17:31:01 -0400 "J. Bruce Fields" wrote: > On Fri, May 31, 2013 at 11:07:23PM -0400, Jeff Layton wrote: > > Executive summary (tl;dr version): This patchset represents an overhaul > > of the file locking code with an aim toward improving its scalability > > and making the code a

Operation per second meanining

2013-06-04 Thread Roman Alekseev
Hello, Please help me to understand op/s (operation per second) value in 'pgmap v71520: 3352 pgs: 3352 active+clean; 212 GB data, 429 GB used, 23444 GB / 23874 GB avail; 89237KB/s wr, 24op/s' line? What does "operation" mean? Thanks -- Kind regards, R. Alekseev -- To unsubscribe from this

libcephfs: Open-By-Handle API question

2013-06-04 Thread Ilya Storozhilov
Hi Ceph developers, in order to represent NFS-frontend to CephFS data storage we are trying to use innovative Open-By-Handle API from 'src/include/cephfs/libcephfs.h' file, which is of 'wip-libcephfs' branch at the moment. API looks quite consistent and useful but we couldn't find a method to g

RGW and Keystone

2013-06-04 Thread Chmouel Boudjnah
Hello Florian, I was wondering how the Keystone integration with ceph, I have been reading the documentation of the way it shows how to configure the keystone endpoints here : http://ceph.com/docs/next/radosgw/config/ and I don't see how the part : keystone endpoint-create --service-id --publi

[PATCH 2/2 v2] Add RADOS API lock tests

2013-06-04 Thread Filippos Giannakos
Add tests for the advisory locking API calls. Signed-off-by: Filippos Giannakos --- src/Makefile.am |6 + src/test/librados/lock.cc | 301 + 2 files changed, 307 insertions(+) create mode 100644 src/test/librados/lock.cc diff --git a/s

[PATCH 0/2 v2] librados: Add RADOS locks to the C/C++ API

2013-06-04 Thread Filippos Giannakos
Hi team, This set of patches export the RADOS advisory locking functionality to the C/C++ API. They have been refactored to incorporate Josh suggestions from his review: * Always set tag to "" for exclusive lock * Add duration argument to lock_{exclusive, shared} * Add lock flags to librados.h

[PATCH 1/2 v2] Add RADOS lock mechanism to the librados C/C++ API.

2013-06-04 Thread Filippos Giannakos
Add functions to the librados C/C++ API, to take advantage and utilize the advisory locking system offered by RADOS. Signed-off-by: Filippos Giannakos --- src/Makefile.am|5 +- src/include/rados/librados.h | 102 ++- src/include/rados/librados.hpp | 2

Re: [ceph-users] Ceph killed by OS because of OOM under high load

2013-06-04 Thread Yan, Zheng
On Tue, Jun 4, 2013 at 2:20 PM, Chen, Xiaoxi wrote: > Hi Greg, > Yes, Thanks for your advice ,we do turn down the > osd_client_message_size_cap to 100MB/OSD ,both Journal queue and filestore > queue are set to 100MB also. > That's 300MB/OSD in total, but from TOP we see: >