Re: Designing a cluster guide

2012-05-22 Thread Stefan Priebe - Profihost AG
Am 21.05.2012 20:13, schrieb Gregory Farnum: On Sat, May 19, 2012 at 1:37 AM, Stefan Priebe s.pri...@profihost.ag wrote: So would you recommand a fast (more ghz) Core i3 instead of a single xeon for this system? (price per ghz is better). If that's all the MDS is doing there, probably? (It

Re: mkfs on osd - failed in 0.47

2012-05-22 Thread Sławomir Skowron
One more thing: === osd.0 === 2012-05-22 10:14:09.801059 7ffc4414a780 -1 filestore(/vol0/data/osd.0) leveldb db created 2012-05-22 10:14:09.804227 7ffc4414a780 -1 filestore(/vol0/data/osd.0) limited size xattrs -- enable filestore_xattr_use_omap 2012-05-22 10:14:09.804250 7ffc4414a780 -1

Re: Designing a cluster guide

2012-05-22 Thread Jerker Nyberg
On Mon, 21 May 2012, Gregory Farnum wrote: This one  the write is considered safe once it is on-disk on all OSDs currently responsible for hosting the object. Is it possible to configure the client to consider the write successful when the data is hitting RAM on all the OSDs but not yet

write cache disabling recommendations for journal and storage disks ?

2012-05-22 Thread Alexandre DERUMIER
Hi, I have some questions about disabling write cache http://ceph.com/docs/master/config-cluster/file-system-recommendations/ Ceph aims for data safety, which means that when the application receives notice that data was written to the disk, that data was actually written to the disk. For old

broken init script under debian squeeze

2012-05-22 Thread Stefan Priebe - Profihost AG
Hi, the ceph init script seems to be broken - at least under debian lenny. It does simply nothing on a configured osd / mon server, so while system start the daemons won't start. Ow do i have to configure the hostname to match anything special in ceph.conf? [root@system123 /etc]#

Re: broken init script under debian squeeze

2012-05-22 Thread SPONEM, Benoît
Hi, Same issue for me under debian wheezy. Greets, Benoît Le 22/05/2012 12:05, Stefan Priebe - Profihost AG a écrit : Hi, the ceph init script seems to be broken - at least under debian lenny. It does simply nothing on a configured osd / mon server, so while system start the daemons

Re: Ceph on btrfs 3.4rc

2012-05-22 Thread Christian Brunner
2012/5/21 Miao Xie mi...@cn.fujitsu.com: Hi Josef, On fri, 18 May 2012 15:01:05 -0400, Josef Bacik wrote: diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h index 9b9b15f..492c74f 100644 --- a/fs/btrfs/btrfs_inode.h +++ b/fs/btrfs/btrfs_inode.h @@ -57,9 +57,6 @@ struct

Snapshot/Clone in RBD

2012-05-22 Thread Eric_YH_Chen
Hi all: According to the document, snapshot in RBD is Read-only. That is to say, if I want to clone the image, I should use rbd_copy. Right? I am curious about if this function is optimized, ex: copy-on-write, to speed up the performance. What I want to do is to integrate

Re: Snapshot/Clone in RBD

2012-05-22 Thread Tomasz Paszkowski
Hi, At this moment there's no support to clone images in RBD. Only way to create new image is to copy the old one through client interface . Sage mentioned that new release will included image layering which will be copy-on-write cloning implementation. On Tue, May 22, 2012 at 12:39 PM,

Re: how to mount a specific pool in cephs

2012-05-22 Thread Grant Ashman
Tommi Virtanen tommi.virtanen at dreamhost.com writes: You don't mount pools directly; there's filesystem metadata (as managed by metadata servers) that is needed too. What you probably want is to specify that a subtree of your ceph dfs stores the file data in a separate pool, using cephfs

Re: PGs stuck in creating state

2012-05-22 Thread Vladimir Bashkirtsev
On 08/05/12 01:26, Sage Weil wrote: On Mon, 7 May 2012, Vladimir Bashkirtsev wrote: On 20/04/12 14:41, Sage Weil wrote: On Fri, 20 Apr 2012, Vladimir Bashkirtsev wrote: Dear devs, First of all I would like to bow my head at your great effort! Even if ceph did not reach prime time status yet

how to debug slow rbd block device

2012-05-22 Thread Stefan Priebe - Profihost AG
Hi list, my ceph block testcluster is now running fine. Setup: 4x ceph servers - 3x mon with /mon on local os SATA disk - 4x OSD with /journal on tmpfs and /srv on intel ssd all of them use 2x 1Gbit/s lacp trunk. 1x KVM Host system (2x 1Gbit/s lacp trunk) With one KVM i do not get more

Re: KVM/RBD Block device hangs

2012-05-22 Thread Stefan Priebe - Profihost AG
Am 21.05.2012 22:57, schrieb Gregory Farnum: On Mon, May 21, 2012 at 1:51 PM, Stefan Priebe s.pri...@profihost.ag wrote: Am 21.05.2012 16:59, schrieb Wido den Hollander: Probably after, but both is fine. I just want to know how your cluster is doing and what the PG states are. will do so.

Re: Ceph on btrfs 3.4rc

2012-05-22 Thread Josef Bacik
On Mon, May 21, 2012 at 11:59:54AM +0800, Miao Xie wrote: Hi Josef, On fri, 18 May 2012 15:01:05 -0400, Josef Bacik wrote: diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h index 9b9b15f..492c74f 100644 --- a/fs/btrfs/btrfs_inode.h +++ b/fs/btrfs/btrfs_inode.h @@ -57,9

Re: write cache disabling recommendations for journal and storage disks ?

2012-05-22 Thread Sage Weil
On Tue, 22 May 2012, Alexandre DERUMIER wrote: Hi, I have some questions about disabling write cache http://ceph.com/docs/master/config-cluster/file-system-recommendations/ Ceph aims for data safety, which means that when the application receives notice that data was written to the disk,

Re: how to debug slow rbd block device

2012-05-22 Thread Andrey Korolyov
Hi, I`ve run in almost same problem about two months ago, and there is a couple of corner cases: near-default tcp parameters, small journal size, disks that are not backed by controller with NVRAM cache and high load on osd` cpu caused by side processes. Finally, I have able to achieve 115Mb/s

Re: write cache disabling recommendations for journal and storage disks ?

2012-05-22 Thread Alexandre DERUMIER
Thanks Sage, yes newer kernel doesn't need barrier option since 2.6.37 if I remember. (support of REQ_FLUSH/FUA) Just to be sure: If client do a fsync or fdatasync, does the write will go only to journal and after 30seconds is flushed to disk ? or does it force the write to be committed

Re: write cache disabling recommendations for journal and storage disks ?

2012-05-22 Thread Sage Weil
On Tue, 22 May 2012, Alexandre DERUMIER wrote: Thanks Sage, yes newer kernel doesn't need barrier option since 2.6.37 if I remember. (support of REQ_FLUSH/FUA) Just to be sure: If client do a fsync or fdatasync, does the write will go only to journal and after 30seconds is flushed

Re: write cache disabling recommendations for journal and storage disks ?

2012-05-22 Thread Alexandre DERUMIER
Yes, I was talking about client fsync. (Sorry, I was not clear in my mail) But thanks for the informations about fsync in ceph osd,I'm understand now how things works. About faking: I just want to say that if the client do a fsync, the fsync doesn't force the flush the disk platter but only to

Re: broken init script under debian squeeze

2012-05-22 Thread Yehuda Sadeh
On Tue, May 22, 2012 at 3:05 AM, Stefan Priebe - Profihost AG s.pri...@profihost.ag wrote: Hi, the ceph init script seems to be broken - at least under debian lenny. It does simply nothing on a configured osd / mon server, so while system start the daemons won't start. Ow do i have to

Re: Huge MDS log crashing the cluster

2012-05-22 Thread Tommi Virtanen
On Tue, May 22, 2012 at 4:03 AM, Madhusudhana U madhusudhana.u.acha...@gmail.com wrote: Hi, I have dedicated 10G space for /var partition. But, MDS log has filled the entire /var partition and my cluster is not responding. How shall I fix this? and what I need to make so that /var partition

Re: Ceph on btrfs 3.4rc

2012-05-22 Thread Josef Bacik
On Tue, May 22, 2012 at 12:29:59PM +0200, Christian Brunner wrote: 2012/5/21 Miao Xie mi...@cn.fujitsu.com: Hi Josef, On fri, 18 May 2012 15:01:05 -0400, Josef Bacik wrote: diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h index 9b9b15f..492c74f 100644 ---

Re: MDS crash, wont startup again

2012-05-22 Thread Greg Farnum
On Tuesday, May 22, 2012 at 3:12 AM, Felix Feinhals wrote: I am not quite sure on how to get you the coredump infos. I installed all ceph-dbg packages and executed: gdb /usr/bin/ceph-mds core snip GNU gdb (GDB) 7.0.1-debian Copyright (C) 2009 Free Software Foundation, Inc. License

Re: mkfs on osd - failed in 0.47

2012-05-22 Thread Sławomir Skowron
Ok, now it is clear to me. I disable filestore_xattr_use_omap for now, and i will try to move puppet class to xfs for a new cluster init :) Thanks On Tue, May 22, 2012 at 7:47 PM, Greg Farnum g...@inktank.com wrote: On Tuesday, May 22, 2012 at 1:21 AM, Sławomir Skowron wrote: One more thing:

Re: how to mount a specific pool in cephs

2012-05-22 Thread Greg Farnum
On Tuesday, May 22, 2012 at 5:02 AM, Grant Ashman wrote: Tommi Virtanen tommi.virtanen at dreamhost.com (http://dreamhost.com) writes: You don't mount pools directly; there's filesystem metadata (as managed by metadata servers) that is needed too. What you probably want is to

RGW, future directions

2012-05-22 Thread Yehuda Sadeh
RGW is maturing. Beside looking at performance, which highly ties into RADOS performance, we'd like to hear whether there are certain pain points or future directions that you (you as in the ceph community) would like to see us taking. There are a few directions that we were thinking about: 1.

Re: RGW, future directions

2012-05-22 Thread Sławomir Skowron
On Tue, May 22, 2012 at 8:07 PM, Yehuda Sadeh yeh...@inktank.com wrote: RGW is maturing. Beside looking at performance, which highly ties into RADOS performance, we'd like to hear whether there are certain pain points or future directions that you (you as in the ceph community) would like to

Re: RGW, future directions

2012-05-22 Thread Yehuda Sadeh
On Tue, May 22, 2012 at 11:25 AM, Sławomir Skowron szi...@gmail.com wrote: On Tue, May 22, 2012 at 8:07 PM, Yehuda Sadeh yeh...@inktank.com wrote: RGW is maturing. Beside looking at performance, which highly ties into RADOS performance, we'd like to hear whether there are certain pain points

Re: how to debug slow rbd block device

2012-05-22 Thread Stefan Priebe
Am 22.05.2012 16:52, schrieb Andrey Korolyov: Hi, I`ve run in almost same problem about two months ago, and there is a couple of corner cases: near-default tcp parameters, small journal size, disks that are not backed by controller with NVRAM cache and high load on osd` cpu caused by side

Re: broken init script under debian squeeze

2012-05-22 Thread Stefan Priebe
Am 22.05.2012 18:08, schrieb Yehuda Sadeh: On Tue, May 22, 2012 at 3:05 AM, Stefan Priebe - Profihost AG s.pri...@profihost.ag wrote: Hi, the ceph init script seems to be broken - at least under debian lenny. It does simply nothing on a configured osd / mon server, so while system start the

Re: how to debug slow rbd block device

2012-05-22 Thread Greg Farnum
What does your test look like? With multiple large IOs in flight we can regularly fill up a 1GbE link on our test clusters. With smaller or fewer IOs in flight performance degrades accordingly. On Tuesday, May 22, 2012 at 5:45 AM, Stefan Priebe - Profihost AG wrote: Hi list, my ceph

Re: build errors centos 6.2

2012-05-22 Thread Matt Weil
build/obj/util.do: Compiling dynamic object src/simplexml.c:27:27: error: libxml/parser.h: No such file or directory src/simplexml.c:48: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'saxGetEntity' cc1: warnings being treated as errors src/simplexml.c:56: error: type defaults

Re: how to debug slow rbd block device

2012-05-22 Thread Greg Farnum
On Tuesday, May 22, 2012 at 12:40 PM, Stefan Priebe wrote: Am 22.05.2012 21:35, schrieb Greg Farnum: What does your test look like? With multiple large IOs in flight we can regularly fill up a 1GbE link on our test clusters. With smaller or fewer IOs in flight performance degrades

Re: RGW, future directions

2012-05-22 Thread Sławomir Skowron
On Tue, May 22, 2012 at 9:09 PM, Yehuda Sadeh yeh...@inktank.com wrote: On Tue, May 22, 2012 at 11:25 AM, Sławomir Skowron szi...@gmail.com wrote: On Tue, May 22, 2012 at 8:07 PM, Yehuda Sadeh yeh...@inktank.com wrote: RGW is maturing. Beside looking at performance, which highly ties into

Re: how to debug slow rbd block device

2012-05-22 Thread Stefan Priebe
Am 22.05.2012 21:52, schrieb Greg Farnum: On Tuesday, May 22, 2012 at 12:40 PM, Stefan Priebe wrote: Huh. That's less than I would expect. Especially since it ought to be going through the page cache. What version of RBD is KVM using here? v0.47.1 Can you (from the KVM host) run rados -p

Re: ceph osd crush add - uknown command crush

2012-05-22 Thread Greg Farnum
On Tuesday, May 22, 2012 at 1:15 PM, Sławomir Skowron wrote: /usr/bin/ceph -v ceph version 0.47.1 (commit:f5a9404445e2ed5ec2ee828aa53d73d4a002f7a5) root@obs-10-177-66-4:/# /usr/bin/ceph osd crush add 1 osd.1 1.0 pool=default rack=unknownrack host=obs-10-177-66-4 root@obs-10-177-66-4:/#

Re: how to debug slow rbd block device

2012-05-22 Thread Stefan Priebe
Am 22.05.2012 21:52, schrieb Greg Farnum: On Tuesday, May 22, 2012 at 12:40 PM, Stefan Priebe wrote: Huh. That's less than I would expect. Especially since it ought to be going through the page cache. What version of RBD is KVM using here? v0.47.1 Can you (from the KVM host) run rados -p

Re: how to debug slow rbd block device

2012-05-22 Thread Greg Farnum
On Tuesday, May 22, 2012 at 1:30 PM, Stefan Priebe wrote: Am 22.05.2012 21:52, schrieb Greg Farnum: On Tuesday, May 22, 2012 at 12:40 PM, Stefan Priebe wrote: Huh. That's less than I would expect. Especially since it ought to be going through the page cache. What version of RBD is KVM

Re: build errors centos 6.2

2012-05-22 Thread Yehuda Sadeh
On Tue, May 22, 2012 at 12:42 PM, Matt Weil mw...@genome.wustl.edu wrote: build/obj/util.do: Compiling dynamic object src/simplexml.c:27:27: error: libxml/parser.h: No such file or directory src/simplexml.c:48: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'saxGetEntity'

Re: how to debug slow rbd block device

2012-05-22 Thread Stefan Priebe
Am 22.05.2012 22:48, schrieb Mark Nelson: Can you use something like iostat or collectl to check and see if the write throughput to each SSD is roughly equal during your tests? It is but just around 20-40MB/s. But they can write 260MB/s with sequential writes. Also, what FS are you using and

Re: how to debug slow rbd block device

2012-05-22 Thread Stefan Priebe
Am 22.05.2012 22:49, schrieb Greg Farnum: Anyway, it looks like you're just paying a synchronous write penalty What does that exactly mean? Shouldn't one threaded write to four 260MB/s devices gives at least 100Mb/s? since with 1 write at a time you're getting 30-40MB/s out of rados bench,

Re: how to debug slow rbd block device

2012-05-22 Thread Greg Farnum
On Tuesday, May 22, 2012 at 2:00 PM, Stefan Priebe wrote: Am 22.05.2012 22:49, schrieb Greg Farnum: Anyway, it looks like you're just paying a synchronous write penalty What does that exactly mean? Shouldn't one threaded write to four 260MB/s devices gives at least 100Mb/s? Well,

Re: how to mount a specific pool in cephs

2012-05-22 Thread Greg Farnum
On Tuesday, May 22, 2012 at 2:12 PM, Grant Ashman wrote: That's the right pool ID; yes. I believe the problem is that the cephfs tool currently requires you to fill in all the fields, not ?just the one you wish to change. Try that (setting all the other values to match what you see when

Re: how to mount a specific pool in cephs

2012-05-22 Thread Grant Ashman
Greg Farnum greg at inktank.com writes: Oh, I got this conversation confused with another one. You also need to specify the pool as a valid pool to store filesystem data in, if you haven't done that already: ceph mds add_data_pool poolname Thanks Greg, However, I still get the same error

Re: how to mount a specific pool in cephs

2012-05-22 Thread Greg Farnum
On Tuesday, May 22, 2012 at 2:31 PM, Grant Ashman wrote: Greg Farnum greg at inktank.com (http://inktank.com) writes: Oh, I got this conversation confused with another one. You also need to specify the pool as a valid pool to store filesystem data in, if you haven't done that already:

RE: how to mount a specific pool in cephs

2012-05-22 Thread Grant Ashman
That's the right pool ID; yes. I believe the problem is that the cephfs tool currently requires you to fill in all the fields, not ?just the one you wish to change. Try that (setting all the other values to match what you see when you view the layout). :) - Greg Hi, I've tried setting all

Re: how to mount a specific pool in cephs

2012-05-22 Thread Grant Ashman
Greg Farnum greg at inktank.com writes: When I specify the add data pool I get the following: (with or without the additional values) root at dsan-test:~# ceph mds add_data_pool backup added data pool 0 to mdsmap Okay, that's not right — it should say pool 3. According to the docs

Re: how to mount a specific pool in cephs

2012-05-22 Thread Gregory Farnum
On Tuesday, May 22, 2012 at 2:51 PM, Grant Ashman wrote: Awesome, that seemed to work! However, I feel a bit silly - what I'm after is: /mnt/ceph-data - mounted to pool 0 (data) /mnt/ceph-backup - mounted to pool 3 (backup) but this seemed to change both to mount to pool 3? Am I simply