Re: OSD deadlock with cephfs client and OSD on same machine

2012-11-06 Thread Amon Ott
Am 05.11.2012 21:17, schrieb Cláudio Martins: > > On Fri, 1 Jun 2012 11:35:37 +0200 Amon Ott wrote: >> >> After backporting syncfs() support into Debian stable libc6 2.11 and >> recompiling Ceph with it, our test cluster is now running with syncfs(). >> > > Hi, > > We're running OSDs on top

Re: RBD trim / unmap support?

2012-11-06 Thread Stefan Priebe - Profihost AG
Am 03.11.2012 19:47, schrieb Stefan Priebe: Hi Josh, Thanks - here it comes: https://www.dropbox.com/s/blegk8u14n30ntq/librbd.log.gz It this one OK? Do you get an idea what this could be? I checked this again today with latest git and wip-rbd-read branch but i still get the same I/O errors

Re: [PATCH 1/2] mds: Don't acquire replica object's versionlock

2012-11-06 Thread Yan, Zheng
On 11/06/2012 02:52 AM, Sage Weil wrote: > On Thu, 1 Nov 2012, Yan, Zheng wrote: >> From: "Yan, Zheng" >> >> Both CInode and CDentry's versionlocks are of type LocalLock. >> Acquiring LocalLock in replica object is useless and problematic. >> For example, if two requests try acquiring a replica ob

Re: BUG: kvm crashing in void librbd::AioCompletion::complete_request

2012-11-06 Thread Alexandre DERUMIER
>>Most mem get used while doing a lot of small reads. I see same problem with my test and wip-rbd-read. I got also strange fio results, from 5k to 100K iops (increase and increase in time during the bench). - Mail original - De: "Stefan Priebe" À: "Sage Weil" Cc: ceph-devel@vg

Re: Large numbers of OSD per node

2012-11-06 Thread Wido den Hollander
On 06-11-12 03:05, Andrew Thrift wrote: Mark, Wido, Thank you very much for your informed responses. You're welcome! What you have mentioned makes a lot of sense. If we had a single node completely fail, we would have 72TB of data that needed to be replicated to a new OSD. This would ta

Re: Fs to use?

2012-11-06 Thread Wido den Hollander
On 06-11-12 07:54, Stefan Priebe - Profihost AG wrote: Hello list, is there any recommendation regarding fs? I mean btrfs is still experimental would you still use it with ceph in production? Do I need big metadata with btrfs? (seems to make btrfs slow) XFS seems to be the recommendation

Re: BUG: kvm crashing in void librbd::AioCompletion::complete_request

2012-11-06 Thread Stefan Priebe - Profihost AG
Am 06.11.2012 10:07, schrieb Alexandre DERUMIER: Most mem get used while doing a lot of small reads. I see same problem with my test and wip-rbd-read. I got also strange fio results, from 5k to 100K iops (increase and increase in time during the bench). Update to latest wip-rbd-read frmo ye

Re: Large numbers of OSD per node

2012-11-06 Thread Wido den Hollander
On 06-11-12 10:36, Gandalf Corvotempesta wrote: 2012/11/6 Wido den Hollander : You shouldn't only think about a complete failure solution. The distributed architecture of Ceph also gives you the freedom to take out a node whenever you want to do maintenance or just don't trust the node and you

Re: chaning pg_num / pgp_num after adding more osds

2012-11-06 Thread Sage Weil
On Tue, 6 Nov 2012, Stefan Priebe - Profihost AG wrote: > Am 06.11.2012 00:45, schrieb Josh Durgin: > > On 11/05/2012 06:14 AM, Stefan Priebe - Profihost AG wrote: > > > Hello list, > > > > > > Is there a way to change the number of pg_num / pgp_num after adding > > > more osds? > > > > The pg_nu

Re: Large numbers of OSD per node

2012-11-06 Thread Stefan Kleijkers
On 11/06/2012 11:24 AM, Gandalf Corvotempesta wrote: 2012/11/6 Wido den Hollander : The setup described on that page has 90 nodes, so one node failing is a little over 1% of the cluster which fails. I think i'm missing something. In case of a failure, they will always have to resync 36 TB of da

Re: chaning pg_num / pgp_num after adding more osds

2012-11-06 Thread Stefan Kleijkers
On 11/06/2012 11:59 AM, Sage Weil wrote: When i have one pool with 800 pgs and i add 20 new OSDs how does a new pool help? I mean the old pools will stay with 800 pgs. It make the distribution of existing data less coarse, but as the size of teh PGs for the new pool increases things will tend

Re: Large numbers of OSD per node

2012-11-06 Thread Stefan Kleijkers
On 11/06/2012 12:31 PM, Gandalf Corvotempesta wrote: 2012/11/6 Stefan Kleijkers : Well you have to keep in mind that when a node fails the PG's that resided on that node have to be redistributed over all the other nodes. So you begin moving about 1% of the data between all the remaining nodes/os

Ubuntu 12.04.1 + xfs + syncfs is still not our friend

2012-11-06 Thread Oliver Francke
Hi *, anybody out there who's in Ubuntu 12.04.1/ in connection with libc + xfs + syncfs? We bit the bullet and reinstalled two new nodes from debian to precise in favour of possible performance increase?! *sigh*, still getting: 2012-11-06 17:05:51.863921 7f5cc52e3780 0 filestore(/data/osd

Re: Ubuntu 12.04.1 + xfs + syncfs is still not our friend

2012-11-06 Thread Oliver Francke
In answer to myself, On 11/06/2012 05:14 PM, Oliver Francke wrote: Hi *, anybody out there who's in Ubuntu 12.04.1/ in connection with libc + xfs + syncfs? ii libc62.15-0ubuntu10.3 but: apt-cache show... . . Provides: glibc-2.13-1 . . playing with numbers, I th

Re: Ubuntu 12.04.1 + xfs + syncfs is still not our friend

2012-11-06 Thread Alexandre DERUMIER
syncfs is only available since glibc 2.14 (and kernel 2.6.39) - Mail original - De: "Oliver Francke" À: ceph-devel@vger.kernel.org Envoyé: Mardi 6 Novembre 2012 17:31:29 Objet: Re: Ubuntu 12.04.1 + xfs + syncfs is still not our friend In answer to myself, On 11/06/2012 05:14 PM,

Re: Ubuntu 12.04.1 + xfs + syncfs is still not our friend

2012-11-06 Thread Oliver Francke
Hi Jens, sorry for the double work… answered off-list already ;) Oliver. Am 06.11.2012 um 19:46 schrieb Jens Rehpöhler : > On 06.11.2012 18:33, Gandalf Corvotempesta wrote: >> 2012/11/6 Oliver Francke : >>> 2012-11-06 17:05:51.863921 7f5cc52e3780 0 filestore(/data/osd6-3) mount >>> syncfs(2) s

Re: Ubuntu 12.04.1 + xfs + syncfs is still not our friend

2012-11-06 Thread Jens Rehpöhler
On 06.11.2012 18:33, Gandalf Corvotempesta wrote: > 2012/11/6 Oliver Francke : >> 2012-11-06 17:05:51.863921 7f5cc52e3780 0 filestore(/data/osd6-3) mount >> syncfs(2) syscall not support by glibc >> 2012-11-06 17:05:51.863925 7f5cc52e3780 0 filestore(/data/osd6-3) mount no >> syncfs(2), must use

Cleaning up CephFS pool

2012-11-06 Thread Denis Fondras
Hello all, I noticed that removing files on CephFS doesn't reclaim free space using ceph version 0.52(commit:e48859474c4944d4ff201ddc9f5fd400e8898173) I created 2 500GB files on CephFS (mounted with ceph-fuse) and then removed them. Now "rados df -p data" shows a 1GB usage : --8<

Re: Cleaning up CephFS pool

2012-11-06 Thread Gregory Farnum
Deleting files on CephFS doesn't instantaneously remove the underlying objects because they could still be in use by other clients, and removal takes time proportional to the size of the file. Instead the MDS queues it up to be removed asynchronously. You should see the number of objects counter go

Re: Cleaning up CephFS pool

2012-11-06 Thread Denis Fondras
Le 06/11/2012 20:24, Gregory Farnum a écrit : Deleting files on CephFS doesn't instantaneously remove the underlying objects because they could still be in use by other clients, and removal takes time proportional to the size of the file. Instead the MDS queues it up to be removed asynchronously

Re: What would a good OSD node hardware configuration look like?

2012-11-06 Thread Josh Durgin
On 11/05/2012 06:49 PM, Dennis Jacobfeuerborn wrote: On 11/06/2012 01:14 AM, Josh Durgin wrote: On 11/05/2012 09:13 AM, Dennis Jacobfeuerborn wrote: Hi, I'm thinking about building a ceph cluster and I'm wondering what a good configuration would look like for 4-8 (and maybe more) 2HU 8-disk or

Re: Cleaning up CephFS pool

2012-11-06 Thread Gregory Farnum
On Tue, Nov 6, 2012 at 11:29 AM, Denis Fondras wrote: > > Le 06/11/2012 20:24, Gregory Farnum a écrit : > >> Deleting files on CephFS doesn't instantaneously remove the underlying >> objects because they could still be in use by other clients, and >> removal takes time proportional to the size of

Re: RBD trim / unmap support?

2012-11-06 Thread Stefan Priebe
Am 02.11.2012 17:16, schrieb Josh Durgin: On 11/02/2012 02:08 AM, Stefan Priebe - Profihost AG wrote: Hi Jost, i tried scsi-hd with discard_granularity=512 the UNMAP / Trim itself works fine the space is freed at ceph but at the VM i'm getting: [ 75.076895] sd 2:0:0:4: [sdc] [ 75.078353] Re

Re: Ubuntu 12.04.1 + xfs + syncfs is still not our friend

2012-11-06 Thread Dan Mick
Resolution: installing the packages built for precise, rather than squeeze, got versions that use syncfs. On 11/06/2012 08:31 AM, Oliver Francke wrote: In answer to myself, On 11/06/2012 05:14 PM, Oliver Francke wrote: Hi *, anybody out there who's in Ubuntu 12.04.1/ in connection with libc

Re: What would a good OSD node hardware configuration look like?

2012-11-06 Thread Dennis Jacobfeuerborn
On 11/06/2012 08:30 PM, Josh Durgin wrote: > On 11/05/2012 06:49 PM, Dennis Jacobfeuerborn wrote: >> On 11/06/2012 01:14 AM, Josh Durgin wrote: >>> On 11/05/2012 09:13 AM, Dennis Jacobfeuerborn wrote: Hi, I'm thinking about building a ceph cluster and I'm wondering what a good config

Re: Add monitor problems

2012-11-06 Thread Mandell Degerness
Sorry, all. It turns out the problems were entirely on our side (bad ceph.conf files on the new servers). On Tue, Nov 6, 2012 at 11:09 AM, Mandell Degerness wrote: > I'm seeing some weird errors when I add multiple monitors and I'm > hoping the list can shed some light to let me know if I need t

Debugging lsetxattr / llistxattr

2012-11-06 Thread Noah Watkins
Hi all, I'm having some difficulty debugging a symlink xattr problem, though luckily it is reproducible. A basic test that adds an xattr to a symlink: ceph_lsetxattr(symlink path, ...); ceph_llistxattr(symlink path, ...); results in a return value of 0 from llistxattr (no xattrs found). H

Re: What would a good OSD node hardware configuration look like?

2012-11-06 Thread Wido den Hollander
On 07-11-12 02:35, Dennis Jacobfeuerborn wrote: On 11/06/2012 08:30 PM, Josh Durgin wrote: On 11/05/2012 06:49 PM, Dennis Jacobfeuerborn wrote: On 11/06/2012 01:14 AM, Josh Durgin wrote: On 11/05/2012 09:13 AM, Dennis Jacobfeuerborn wrote: Hi, I'm thinking about building a ceph cluster and