RE: severe librbd performance degradation in Giant

2014-09-18 Thread Somnath Roy
Sage, Any reason why the cache is by default enabled in Giant ? Regarding profiling, I will try if I can run Vtune/mutrace on this. Thanks Regards Somnath -Original Message- From: Sage Weil [mailto:sw...@redhat.com] Sent: Wednesday, September 17, 2014 8:53 PM To: Somnath Roy Cc: Haomai

How to use radosgw-admin to delete some or all users?

2014-09-18 Thread Zhao zhiming
HI ALL, I know radosgw-admin can delete one user with command ‘radosgw-admin user rm uid=xxx’, I want to know have some commands to delete multiple or all users? thanks. -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to

RE: severe librbd performance degradation in Giant

2014-09-18 Thread Chen, Xiaoxi
Same question as Somnath, some customer of us not feeling that comfortable with cache, they still have some consistent concern. -Original Message- From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Somnath Roy Sent: Thursday, September 18,

Re: severe librbd performance degradation in Giant

2014-09-18 Thread Alexandre DERUMIER
According http://tracker.ceph.com/issues/9513, do you mean that rbd cache will make 10x performance degradation for random read? Hi, on my side, I don't see any degradation performance on read (seq or rand) with or without. firefly : around 12000iops (with or without rbd_cache) giant :

Re: ARM NEON optimisations for gf-complete/jerasure/ceph-erasure

2014-09-18 Thread Janne Grunau
Hi Kevin, On 2014-09-16 11:25:12 -0700, Kevin Greenan wrote: I feel that separating the arch-specific implementations out and have a default 'generic' implementation would be a huge improvement. Note that gf-complete was in active development for some time before including the SIMD code.

v2 aligned buffer changes for erasure codes

2014-09-18 Thread Janne Grunau
Hi, following a is an updated patchset. It passes now make check in src It has following changes: * use 32-byte alignment since the isa plugin use AVX2 (src/erasure-code/isa/README claims it needs 16*k byte aligned buffers but I can't see a reason why it would need more than 32-bytes *

[PATCH v2 1/3] buffer: add an aligned buffer with less alignment than a page

2014-09-18 Thread Janne Grunau
SIMD optimized erasure code computation needs aligned memory. Buffers aligned to a page boundary are wasted on it though. The buffers used for the erasure code computation are typical smaller than a page. An alignment of 32 bytes is chosen to satisfy the needs of AVX/AVX2. Could be made arch

[PATCH v2 2/3] ec: use 32-byte aligned buffers

2014-09-18 Thread Janne Grunau
Requiring page aligned buffers and realigning the input if necessary creates measurable oberhead. ceph_erasure_code_benchmark is ~30% faster with this change for technique=reed_sol_van,k=2,m=1. Also prevents a misaligned buffer when bufferlist::c_str(bufferlist) has to allocate a new buffer to

RE: v2 aligned buffer changes for erasure codes

2014-09-18 Thread Andreas Joachim Peters
Hi Janne, = (src/erasure-code/isa/README claims it needs 16*k byte aligned buffers I should update the README since it is misleading ... it should say 8*k or 16*k byte aligned chunk size depending on the compiler/platform used, it is not the alignment of the allocated buffer addresses.The

RE: v2 aligned buffer changes for erasure codes

2014-09-18 Thread Andreas Joachim Peters
Hi Janne/Loic, there is more confusion atleast on my side ... I had now a look at the jerasure plug-in and I am now slightly confused why you have two ways to return in get_alignment ... one is as I assume and another one is per_chunk_alignment ... what should the function return Loic? Cheers

Re: severe librbd performance degradation in Giant

2014-09-18 Thread Mark Nelson
On 09/18/2014 04:49 AM, Alexandre DERUMIER wrote: According http://tracker.ceph.com/issues/9513, do you mean that rbd cache will make 10x performance degradation for random read? Hi, on my side, I don't see any degradation performance on read (seq or rand) with or without. firefly : around

Re: v2 aligned buffer changes for erasure codes

2014-09-18 Thread Janne Grunau
Hi, On 2014-09-18 12:18:59 +, Andreas Joachim Peters wrote: = (src/erasure-code/isa/README claims it needs 16*k byte aligned buffers I should update the README since it is misleading ... it should say 8*k or 16*k byte aligned chunk size depending on the compiler/platform used, it is

snap_trimming + backfilling is inefficient with many purged_snaps

2014-09-18 Thread Dan Van Der Ster
(moving this discussion to -devel) Begin forwarded message: From: Florian Haas flor...@hastexo.com Date: 17 Sep 2014 18:02:09 CEST Subject: Re: [ceph-users] RGW hung, 2 OSDs using 100% CPU To: Dan Van Der Ster daniel.vanders...@cern.ch Cc: Craig Lewis cle...@centraldesktop.com,

Re: Fwd: S3 API Compatibility support

2014-09-18 Thread M Ranga Swami Reddy
Hi , Could you please check and clarify the below question on object lifecycle and notification S3 APIs support: 1. To support the bucket lifecycle - we need to support the moving/deleting the objects/buckets based lifecycle settings. For ex: If an object lifecyle set as below: 1.

Re: v2 aligned buffer changes for erasure codes

2014-09-18 Thread Janne Grunau
Hi, On 2014-09-18 12:34:49 +, Andreas Joachim Peters wrote: there is more confusion atleast on my side ... I had now a look at the jerasure plug-in and I am now slightly confused why you have two ways to return in get_alignment ... one is as I assume and another one is

RE: v2 aligned buffer changes for erasure codes

2014-09-18 Thread Andreas Joachim Peters
Hi Janne, For encoding there is normally a single buffer split 'virtually' into k pieces. To make all pieces starting at an aligned address one needs to align the chunk size to e.g. 16*k. I don't get that. How is the buffer splitted? into k (+ m) chunk size parts? As long as the start and

Re: v2 aligned buffer changes for erasure codes

2014-09-18 Thread Janne Grunau
On 2014-09-18 13:01:03 +, Andreas Joachim Peters wrote: For encoding there is normally a single buffer split 'virtually' into k pieces. To make all pieces starting at an aligned address one needs to align the chunk size to e.g. 16*k. I don't get that. How is the buffer

RE: severe librbd performance degradation in Giant

2014-09-18 Thread Sage Weil
On Thu, 18 Sep 2014, Somnath Roy wrote: Sage, Any reason why the cache is by default enabled in Giant ? It's recommended practice to turn it on. It improves performance in general (especially with HDD OSDs). Do you mind comparing sequential small IOs? sage Regarding profiling, I will try

RE: v2 aligned buffer changes for erasure codes

2014-09-18 Thread Andreas Joachim Peters
I fail to see how the 32 * k is related to alignment. It's only used for to pad the total size so it becomes a mulitple of k * 32. That is ok since we want k 32-byte aligned chunks. The alignment for each chunk is just 32-bytes. Yes, agreed! The alignment for each chunk should be 32 bytes.

Re: snap_trimming + backfilling is inefficient with many purged_snaps

2014-09-18 Thread Florian Haas
Hi Dan, saw the pull request, and can confirm your observations, at least partially. Comments inline. On Thu, Sep 18, 2014 at 2:50 PM, Dan Van Der Ster daniel.vanders...@cern.ch wrote: Do I understand your issue report correctly in that you have found setting osd_snap_trim_sleep to be

Re: radosgw-admin list users?

2014-09-18 Thread Yehuda Sadeh
On Thu, Sep 18, 2014 at 10:27 AM, Robin H. Johnson robb...@gentoo.org wrote: Related to this thread, radosgw-admin doesn't seem to have anything to list the users. The closest I have as a hack is: rados ls --pool=.users.uid |sed 's,.buckets$,,g' |sort |uniq Try: $ radosgw-admin metadata

RE: severe librbd performance degradation in Giant

2014-09-18 Thread Somnath Roy
Alexandre, What tool are you using ? I used fio rbd. Also, I hope you have Giant package installed in the client side as well and rbd_cache =true is set on the client conf file. FYI, firefly librbd + librados and Giant cluster will work seamlessly and I had to make sure fio rbd is really

Re: snap_trimming + backfilling is inefficient with many purged_snaps

2014-09-18 Thread Florian Haas
On Thu, Sep 18, 2014 at 8:56 PM, Mango Thirtyfour daniel.vanders...@cern.ch wrote: Hi Florian, On Sep 18, 2014 7:03 PM, Florian Haas flor...@hastexo.com wrote: Hi Dan, saw the pull request, and can confirm your observations, at least partially. Comments inline. On Thu, Sep 18, 2014 at

Re: snap_trimming + backfilling is inefficient with many purged_snaps

2014-09-18 Thread Dan van der Ster
Hi, September 18 2014 9:03 PM, Florian Haas flor...@hastexo.com wrote: On Thu, Sep 18, 2014 at 8:56 PM, Dan van der Ster daniel.vanders...@cern.ch wrote: Hi Florian, On Sep 18, 2014 7:03 PM, Florian Haas flor...@hastexo.com wrote: Hi Dan, saw the pull request, and can confirm your

Re: snap_trimming + backfilling is inefficient with many purged_snaps

2014-09-18 Thread Dan van der Ster
-- Dan van der Ster || Data Storage Services || CERN IT Department -- September 18 2014 9:12 PM, Dan van der Ster daniel.vanders...@cern.ch wrote: Hi, September 18 2014 9:03 PM, Florian Haas flor...@hastexo.com wrote: On Thu, Sep 18, 2014 at 8:56 PM, Dan van der Ster

Re: radosgw-admin list users?

2014-09-18 Thread Robin H. Johnson
On Thu, Sep 18, 2014 at 10:38:19AM -0700, Yehuda Sadeh wrote: On Thu, Sep 18, 2014 at 10:27 AM, Robin H. Johnson robb...@gentoo.org wrote: Related to this thread, radosgw-admin doesn't seem to have anything to list the users. The closest I have as a hack is: rados ls --pool=.users.uid

Re: snap_trimming + backfilling is inefficient with many purged_snaps

2014-09-18 Thread Florian Haas
On Thu, Sep 18, 2014 at 9:12 PM, Dan van der Ster daniel.vanders...@cern.ch wrote: Hi, September 18 2014 9:03 PM, Florian Haas flor...@hastexo.com wrote: On Thu, Sep 18, 2014 at 8:56 PM, Dan van der Ster daniel.vanders...@cern.ch wrote: Hi Florian, On Sep 18, 2014 7:03 PM, Florian Haas

Re: snap_trimming + backfilling is inefficient with many purged_snaps

2014-09-18 Thread Sage Weil
On Fri, 19 Sep 2014, Florian Haas wrote: Hi Sage, was the off-list reply intentional? Whoops! Nope :) On Thu, Sep 18, 2014 at 11:47 PM, Sage Weil sw...@redhat.com wrote: So, disaster is a pretty good description. Would anyone from the core team like to suggest another course of action

RE: severe librbd performance degradation in Giant

2014-09-18 Thread Shu, Xinxin
I also observed performance degradation on my full SSD setup , I can got ~270K IOPS for 4KB random read with 0.80.4 , but with latest master , I only got ~12K IOPS Cheers, xinxin -Original Message- From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-ow...@vger.kernel.org] On

RE: severe librbd performance degradation in Giant

2014-09-18 Thread Shu, Xinxin
My bad , with latest master , we got ~ 120K IOPS. Cheers, xinxin -Original Message- From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Shu, Xinxin Sent: Friday, September 19, 2014 9:08 AM To: Somnath Roy; Alexandre DERUMIER; Haomai Wang Cc:

Re: radosgw-admin list users?

2014-09-18 Thread Zhao zhiming
Thanks Robin and Yehuda, but I want to how to delete multiple users. I use 'radosgw-admin metadata list user’ to list all users, and found some users have unreadable code. radosgw-admin metadata list user [ zzm1, ?zzm1, ?zzm1”] and I can’t delete these unreadable users.

Re: [PATCH 1/3] libceph: reference counting pagelist

2014-09-18 Thread Sage Weil
On Tue, 16 Sep 2014, Yan, Zheng wrote: this allow pagelist to present data that may be sent multiple times. Signed-off-by: Yan, Zheng z...@redhat.com Reviewed-by: Sage Weil s...@redhat.com --- fs/ceph/mds_client.c | 1 - include/linux/ceph/pagelist.h | 5 -

Re: [PATCH 1/3] libceph: reference counting pagelist

2014-09-18 Thread Sage Weil
On Tue, 16 Sep 2014, Yan, Zheng wrote: this allow pagelist to present data that may be sent multiple times. Hmm, actually we probably should use the kref code for this, even though the refcounting is trivial. sage Signed-off-by: Yan, Zheng z...@redhat.com --- fs/ceph/mds_client.c

Re: [PATCH 2/3] ceph: use pagelist to present MDS request data

2014-09-18 Thread Sage Weil
On Tue, 16 Sep 2014, Yan, Zheng wrote: Current code uses page array to present MDS request data. Pages in the array are allocated/freed by caller of ceph_mdsc_do_request(). If request is interrupted, the pages can be freed while they are still being used by the request message. The fix is

Re: [PATCH 3/3] ceph: include the initial ACL in create/mkdir/mknod MDS requests

2014-09-18 Thread Sage Weil
On Tue, 16 Sep 2014, Yan, Zheng wrote: Current code set new file/directory's initial ACL in a non-atomic manner. Client first sends request to MDS to create new file/directory, then set the initial ACL after the new file/directory is successfully created. The fix is include the initial ACL

Re: [PATCH] ceph: move ceph_find_inode() outside the s_mutex

2014-09-18 Thread Sage Weil
On Wed, 17 Sep 2014, Yan, Zheng wrote: ceph_find_inode() may wait on freeing inode, using it inside the s_mutex may cause deadlock. (the freeing inode is waiting for OSD read reply, but dispatch thread is blocked by the s_mutex) Signed-off-by: Yan, Zheng z...@redhat.com Reviewed-by: Sage

Re: Fwd: S3 API Compatibility support

2014-09-18 Thread M Ranga Swami Reddy
Hi Sage, Could you please advise, if Ceph support the low cost object storages(like Amazon Glacier or RRS) for archiving objects like log file etc.? Thanks Swami On Thu, Sep 18, 2014 at 6:20 PM, M Ranga Swami Reddy swamire...@gmail.com wrote: Hi , Could you please check and clarify the below

Re: Fwd: S3 API Compatibility support

2014-09-18 Thread Sage Weil
On Fri, 19 Sep 2014, M Ranga Swami Reddy wrote: Hi Sage, Could you please advise, if Ceph support the low cost object storages(like Amazon Glacier or RRS) for archiving objects like log file etc.? Ceph doesn't interact at all with AWS services like Glacier, if that's what you mean. For RRS,