Re: speedup ceph / scaling / find the bottleneck

2012-07-02 Thread Stefan Priebe - Profihost AG
Am 02.07.2012 07:02, schrieb Alexandre DERUMIER: Hi, my 2cent, maybe with lower range (like 100MB) of random io, you have more chance to aggregate them in 4MB block ? Yes maybe. If you have just a range of 100MB the chance you'll hit the same 4MB block again is very high. @sage / mark How

btrfs big metadata

2012-07-02 Thread Stefan Priebe - Profihost AG
Hello list, i found several people who use big metadata -n 64k -l 64k for ceph. But i haven't found any ceph doc or info why to use this? What's the reason to use big metadata feature with ceph? Greets, Stefan -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body

rbd rm allows removal of mapped device, nukes data, then returns -EBUSY

2012-07-02 Thread Florian Haas
Hi everyone, just wanted to check if this was the expected behavior -- it doesn't look like it would be, to me. What I do is create a 1G RBD, and just for the heck of it, make an XFS on it: root@alice:~# rbd create xfsdev --size 1024 root@alice:~# rbd map xfsdev root@alice:~# rbd showmapped id

Re: Radosgw installation and administration docs

2012-07-02 Thread Florian Haas
On Sun, Jul 1, 2012 at 10:22 PM, Chuanyu chua...@cs.nctu.edu.tw wrote: Hi Yehuda, Florian, I follow the wiki, and steps which you discussed, construct my ceph system with rados gateway, and I can use libs3 to upload file via radosgw, (thanks a lot!) but got 405 Method Not Allowed when I use

Re: [PATCH 4/7] Use vfs __set_page_dirty interface instead of doing it inside filesystem

2012-07-02 Thread Sha Zhengju
On 06/29/2012 01:21 PM, Sage Weil wrote: On Thu, 28 Jun 2012, Sha Zhengju wrote: From: Sha Zhengjuhandai@taobao.com Following we will treat SetPageDirty and dirty page accounting as an integrated operation. Filesystems had better use vfs interface directly to avoid those details.

Re:

2012-07-02 Thread Chuanyu Tsai
Chuanyu chuanyu at cs.nctu.edu.tw writes: Hi Yehuda, Florian, I follow the wiki, and steps which you discussed, construct my ceph system with rados gateway, and I can use libs3 to upload file via radosgw, (thanks a lot!) but got 405 Method Not Allowed when I use swift, $ swift -v -A

Does radosgw really need to talk to an MDS?

2012-07-02 Thread Florian Haas
Hi everyone, radosgw(8) states that the following capabilities must be granted to the user that radosgw uses to connect to RADOS. ceph-authtool -n client.radosgw.gateway --cap mon 'allow r' --cap osd 'allow rwx' --cap mds 'allow' /etc/ceph/keyring.radosgw.gateway Could someone explain why we

Re: Does radosgw really need to talk to an MDS?

2012-07-02 Thread Wido den Hollander
Hi, On 02-07-12 13:41, Florian Haas wrote: Hi everyone, radosgw(8) states that the following capabilities must be granted to the user that radosgw uses to connect to RADOS. ceph-authtool -n client.radosgw.gateway --cap mon 'allow r' --cap osd 'allow rwx' --cap mds 'allow'

Assertion failure when radosgw can't authenticate

2012-07-02 Thread Florian Haas
Hi, in cephx enabled clusters (0.47.x), authentication failures from radosgw seem to lead to an uncaught assertion failure: 2012-07-02 11:26:46.559830 b69c5730 0 librados: client.radosgw.charlie authentication error (1) Operation not permitted 2012-07-02 11:26:46.560093 b69c5730 -1 Couldn't

Re: Does radosgw really need to talk to an MDS?

2012-07-02 Thread Florian Haas
On Mon, Jul 2, 2012 at 1:44 PM, Wido den Hollander w...@widodh.nl wrote: You are not allowing the RADOS Gateway to do anything on the MDS. There is no 'r', 'w' or 'x' permission which you are allowing. So there is nothing the rgw has access to on the MDS. Yep, so we might as well leave off

Re: speedup ceph / scaling / find the bottleneck

2012-07-02 Thread Stefan Priebe - Profihost AG
Hello, i just want to report back some test results. Just some results from a sheepdog test using the same hardware. Sheepdog: 1 VM: write: io=12544MB, bw=142678KB/s, iops=35669, runt= 90025msec read : io=14519MB, bw=165186KB/s, iops=41296, runt= 90003msec write: io=16520MB,

Re: Does radosgw really need to talk to an MDS?

2012-07-02 Thread Wido den Hollander
On 02-07-12 13:56, Florian Haas wrote: On Mon, Jul 2, 2012 at 1:44 PM, Wido den Hollander w...@widodh.nl wrote: You are not allowing the RADOS Gateway to do anything on the MDS. There is no 'r', 'w' or 'x' permission which you are allowing. So there is nothing the rgw has access to on the

Re: Interesting results

2012-07-02 Thread Jim Schutt
On 07/01/2012 01:57 PM, Stefan Priebe wrote: thanks for sharing. Which btrfs mount options did you use? -o noatime is all I use. -- Jim Am 29.06.2012 00:37, schrieb Jim Schutt: Hi, Lots of trouble reports go by on the list - I thought it would be useful to report a success. Using a

Re: Interesting results

2012-07-02 Thread Stefan Priebe - Profihost AG
Am 02.07.2012 16:04, schrieb Jim Schutt: On 07/01/2012 01:57 PM, Stefan Priebe wrote: thanks for sharing. Which btrfs mount options did you use? -o noatime is all I use. Thanks. Have you ever measured random I/O performance? Or is sequential all you need? Stefan -- To unsubscribe from

Re: Interesting results

2012-07-02 Thread Jim Schutt
On 07/02/2012 08:07 AM, Stefan Priebe - Profihost AG wrote: Am 02.07.2012 16:04, schrieb Jim Schutt: On 07/01/2012 01:57 PM, Stefan Priebe wrote: thanks for sharing. Which btrfs mount options did you use? -o noatime is all I use. Thanks. Have you ever measured random I/O performance? Or

Re: [PATCH 4/7] Use vfs __set_page_dirty interface instead of doing it inside filesystem

2012-07-02 Thread Sage Weil
On Mon, 2 Jul 2012, Sha Zhengju wrote: On 06/29/2012 01:21 PM, Sage Weil wrote: On Thu, 28 Jun 2012, Sha Zhengju wrote: From: Sha Zhengjuhandai@taobao.com Following we will treat SetPageDirty and dirty page accounting as an integrated operation. Filesystems had better use

Re: Does radosgw really need to talk to an MDS?

2012-07-02 Thread Sage Weil
On Mon, 2 Jul 2012, Wido den Hollander wrote: On 02-07-12 13:56, Florian Haas wrote: On Mon, Jul 2, 2012 at 1:44 PM, Wido den Hollander w...@widodh.nl wrote: You are not allowing the RADOS Gateway to do anything on the MDS. There is no 'r', 'w' or 'x' permission which you are

Re: Assertion failure when radosgw can't authenticate

2012-07-02 Thread Sage Weil
On Mon, 2 Jul 2012, Florian Haas wrote: Hi, in cephx enabled clusters (0.47.x), authentication failures from radosgw seem to lead to an uncaught assertion failure: 2012-07-02 11:26:46.559830 b69c5730 0 librados: client.radosgw.charlie authentication error (1) Operation not permitted

Re: rbd rm allows removal of mapped device, nukes data, then returns -EBUSY

2012-07-02 Thread Josh Durgin
On 07/01/2012 11:58 PM, Florian Haas wrote: Hi everyone, just wanted to check if this was the expected behavior -- it doesn't look like it would be, to me. What I do is create a 1G RBD, and just for the heck of it, make an XFS on it: root@alice:~# rbd create xfsdev --size 1024 root@alice:~#

Re: rbd rm allows removal of mapped device, nukes data, then returns -EBUSY

2012-07-02 Thread Gregory Farnum
On Mon, Jul 2, 2012 at 9:08 AM, Josh Durgin josh.dur...@inktank.com wrote: On 07/01/2012 11:58 PM, Florian Haas wrote: Hi everyone, just wanted to check if this was the expected behavior -- it doesn't look like it would be, to me. What I do is create a 1G RBD, and just for the heck of it,

Re: Does radosgw really need to talk to an MDS?

2012-07-02 Thread Gregory Farnum
On Mon, Jul 2, 2012 at 4:44 AM, Wido den Hollander w...@widodh.nl wrote: Hi, On 02-07-12 13:41, Florian Haas wrote: Hi everyone, radosgw(8) states that the following capabilities must be granted to the user that radosgw uses to connect to RADOS. ceph-authtool -n client.radosgw.gateway

Re: btrfs big metadata

2012-07-02 Thread Gregory Farnum
On Sun, Jul 1, 2012 at 11:56 PM, Stefan Priebe - Profihost AG s.pri...@profihost.ag wrote: Hello list, i found several people who use big metadata -n 64k -l 64k for ceph. But i haven't found any ceph doc or info why to use this? What's the reason to use big metadata feature with ceph? One

Re: speedup ceph / scaling / find the bottleneck

2012-07-02 Thread Gregory Farnum
On Sun, Jul 1, 2012 at 11:12 PM, Stefan Priebe - Profihost AG s.pri...@profihost.ag wrote: Am 02.07.2012 07:02, schrieb Alexandre DERUMIER: Hi, my 2cent, maybe with lower range (like 100MB) of random io, you have more chance to aggregate them in 4MB block ? Yes maybe. If you have just a

Re: Bug with ceph_mount and non-existent directory

2012-07-02 Thread Gregory Farnum
On Tue, Jun 26, 2012 at 8:20 PM, Noah Watkins jayh...@cs.ucsc.edu wrote: I get the following assert failure during cleanup if ceph_mount() is passed a non-existent directory, while ceph_mount() returns success. Nothing critical, but it got triggered with Java unit test framework.    

Re: MDS spinning wild after restart on all nodes

2012-07-02 Thread Gregory Farnum
Amon, I've been going through my backlog of flagged emails and came across this one. Did you ever get that information for the bug that you were going to try and find? -Greg On Fri, Jun 15, 2012 at 9:44 AM, Sage Weil s...@inktank.com wrote: On Fri, 15 Jun 2012, Amon Ott wrote: Hello all, I

Re: btrfs big metadata

2012-07-02 Thread Stefan Priebe
Am 02.07.2012 18:42, schrieb Gregory Farnum: On Sun, Jul 1, 2012 at 11:56 PM, Stefan Priebe - Profihost AG s.pri...@profihost.ag wrote: Hello list, i found several people who use big metadata -n 64k -l 64k for ceph. But i haven't found any ceph doc or info why to use this? What's the reason

Re: Ceph and KVM live migration

2012-07-02 Thread Gregory Farnum
On Sat, Jun 30, 2012 at 8:21 PM, Vladimir Bashkirtsev vladi...@bashkirtsev.com wrote: On 01/07/12 11:59, Josh Durgin wrote: On 06/30/2012 07:15 PM, Vladimir Bashkirtsev wrote: On 01/07/12 10:47, Josh Durgin wrote: On 06/30/2012 05:42 PM, Vladimir Bashkirtsev wrote: Dear all, Currently I

Re: Rados faster than KVM block device?

2012-07-02 Thread Gregory Farnum
On Thu, Jun 28, 2012 at 2:17 PM, Stefan Priebe s.pri...@profihost.ag wrote: Am 28.06.2012 18:12, schrieb Josh Durgin: On 06/28/2012 06:10 AM, Stefan Priebe - Profihost AG wrote: Hello list, my cluster is now pretty stable i'm just wondering about the sequential write values. With rados

Re: Ceph and KVM live migration

2012-07-02 Thread Josh Durgin
On 07/02/2012 11:21 AM, Gregory Farnum wrote: On Sat, Jun 30, 2012 at 8:21 PM, Vladimir Bashkirtsev vladi...@bashkirtsev.com wrote: On 01/07/12 11:59, Josh Durgin wrote: On 06/30/2012 07:15 PM, Vladimir Bashkirtsev wrote: On 01/07/12 10:47, Josh Durgin wrote: On 06/30/2012 05:42 PM,

Re: Ceph and KVM live migration

2012-07-02 Thread Christian Brunner
On Mon, Jul 02, 2012 at 11:21:40AM -0700, Gregory Farnum wrote: On Sat, Jun 30, 2012 at 8:21 PM, Vladimir Bashkirtsev vladi...@bashkirtsev.com wrote: On 01/07/12 11:59, Josh Durgin wrote: On 06/30/2012 07:15 PM, Vladimir Bashkirtsev wrote: On 01/07/12 10:47, Josh Durgin wrote: On

[PATCH] ceph.spec.in: Change license of base package to GPL and use SPDX format

2012-07-02 Thread Holger Macht
LGPLv2 in spec file is not correct, because some of the included packages/binaries are GPLv2. For example: src/mount/mtab.c - package ceph, binary mount.ceph src/common/fiemap.cc - package ceph, binary rbd Also use SPDX format (http://www.spdx.org/licenses) for the sub-package licenses.

Re: Qemu fails to open RBD image when auth_supported is not set to 'none'

2012-07-02 Thread Wido den Hollander
On 06/25/2012 05:45 PM, Wido den Hollander wrote: On 06/25/2012 05:20 PM, Wido den Hollander wrote: Hi, I just tried to start a VM with libvirt with the following disk: disk type='network' device='disk' driver name='qemu' type='raw' cache='none'/ source protocol='rbd'

Re: speedup ceph / scaling / find the bottleneck

2012-07-02 Thread Stefan Priebe
Am 02.07.2012 18:51, schrieb Gregory Farnum: On Sun, Jul 1, 2012 at 11:12 PM, Stefan Priebe - Profihost AG s.pri...@profihost.ag wrote: @sage / mark How does the aggregation work? Does it work 4MB blockwise or target node based? Aggregation is based on the 4MB blocks, and if you've got caching

Re: Should an OSD crash when journal device is out of space?

2012-07-02 Thread Gregory Farnum
Hey guys, Thanks for the problem report. I've created an issue to track it at http://tracker.newdream.net/issues/2687. It looks like we just assume that if you're using a file, you've got enough space for it. It shouldn't be a big deal to at least do some startup checks which will fail gracefully.

Re: bad performance fio random write - rados bench random write to compare?

2012-07-02 Thread Gregory Farnum
On Tue, Jun 19, 2012 at 7:05 AM, Alexandre DERUMIER aderum...@odiso.com wrote: Hi, Is it possible to do random write bench with rados bench command ? I have very base random write performance with 4K block size inside qemu-kvm, 1000 iops/s max with 3 nodes with 3x 5 disk 15k (Maybe it's

Re: speedup ceph / scaling / find the bottleneck

2012-07-02 Thread Josh Durgin
On 07/02/2012 12:22 PM, Stefan Priebe wrote: Am 02.07.2012 18:51, schrieb Gregory Farnum: On Sun, Jul 1, 2012 at 11:12 PM, Stefan Priebe - Profihost AG s.pri...@profihost.ag wrote: @sage / mark How does the aggregation work? Does it work 4MB blockwise or target node based? Aggregation is

Re: [PATCH] ceph.spec.in: Change license of base package to GPL and use SPDX format

2012-07-02 Thread Sage Weil
Applied, thanks! I think the mtab.c should eventually move to ceph-fs-common to mirror the debian package structure (mount.ceph, cephfs tool). It's probably not worth replacing fiemap.cc, though, until someone actually makes a plugin interface and wants to link to the ceph-osd binary that

Re: iostat show constants write to osd disk with writeahead journal, normal behaviour ?

2012-07-02 Thread Gregory Farnum
On Tue, Jun 19, 2012 at 12:09 AM, Alexandre DERUMIER aderum...@odiso.com wrote: Hi, more infos, I have active filestore debug = 20, min interval 29 and max interval 30. I see sync_entry each 30s, so it seem work as expected. cat ceph-osd.0.log |grep sync_entry 2012-06-19 07:56:00.084622

Re: Ceph status for Wheezy

2012-07-02 Thread Yehuda Sadeh
On Sat, Jun 30, 2012 at 10:56 PM, Laszlo Boszormenyi (GCS) g...@debian.hu wrote: Hi Sage, As previously noted, using leveldb caused some trouble with Ceph could be included in Wheezy or not. I've proposed that the supported architectures should be limited in Ceph and leveldb to the ones the

Re: [PATCH] rbd: fix the memory leak of bio_chain_clone

2012-07-02 Thread Yehuda Sadeh
On Mon, Jun 25, 2012 at 12:37 AM, Guangliang Zhao gz...@suse.com wrote: Signed-off-by: Guangliang Zhao gz...@suse.com --- drivers/block/rbd.c | 10 -- 1 files changed, 4 insertions(+), 6 deletions(-) diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c index 65665c9..3d6dfc8

Re: Ceph status for Wheezy

2012-07-02 Thread Yehuda Sadeh
On Mon, Jul 2, 2012 at 2:13 PM, Yehuda Sadeh yeh...@inktank.com wrote: On Sat, Jun 30, 2012 at 10:56 PM, Laszlo Boszormenyi (GCS) g...@debian.hu wrote: Hi Sage, As previously noted, using leveldb caused some trouble with Ceph could be included in Wheezy or not. I've proposed that the

v0.48 argonaut released

2012-07-02 Thread Sage Weil
We're pleased to annouce the release of Ceph v0.48, code-named argonaut. This release will be the basis of our first long-term stable branch. Although we will continue to make releases every 3-4 weeks, this stable release will be maintained with bug fixes and select non-destabilizing feature

Re: iostat show constants write to osd disk with writeahead journal, normal behaviour ?

2012-07-02 Thread Alexandre DERUMIER
Thanks, I'll try that. note: with btrfs, il can use filestore flusher = true + wip_flush_min git branch. and I see write to disk each X second. (2sec of seq write vs 30sec with xfs,that doesn't work, filestore flusher = false + wip_flush,I see constant writes, without flusher in

Re: speedup ceph / scaling / find the bottleneck

2012-07-02 Thread Alexandre DERUMIER
Stefan, As fio benchmark use directio (--direct) , maybe the writeback cache is not working ? perfcounters should give us the answer. - Mail original - De: Josh Durgin josh.dur...@inktank.com À: Stefan Priebe s.pri...@profihost.ag Cc: Gregory Farnum g...@inktank.com, Alexandre

Re: speedup ceph / scaling / find the bottleneck

2012-07-02 Thread Alexandre DERUMIER
Stefan, As fio benchmark use directio (--direct) , maybe the writeback cache is not working ? perfcounters should give us the answer. - Mail original - De: Josh Durgin josh.dur...@inktank.com À: Stefan Priebe s.pri...@profihost.ag Cc: Gregory Farnum g...@inktank.com, Alexandre