Re: still crashing osds with next branch

2012-06-20 Thread Stefan Priebe - Profihost AG
Am 21.06.2012 00:56, schrieb Sage Weil: Just a quick update: there were some problems with doing a rolling upgrade that may be responsible for these. We're testing the fix now. Did this, by chance, happen on a cluster with a mix of 0.47.2 and 0.48? No it was a clean install of the next branch

v0.47.3 released

2012-06-20 Thread Sage Weil
This is a bugfix release with one major fix and a few small ones: * osd: disable use of the FIEMAP ioctl by default as its use was leading to corruption for RBD users * a few minor compile/build/specfile fixes I was going to wait for v0.48, but that is still several days away.  If you are

rgw objects cleanup

2012-06-20 Thread Yehuda Sadeh
One of the issues we have now with rgw is that it requires running a maintenance utility every day so that we remove old objects. These objects were left behind for a while, so that any pending read could complete. There is no way currently to know whether there are reads in progress on any object,

Re: Heavy speed difference between rbd and custom pool

2012-06-20 Thread Dan Mick
On 06/19/2012 11:46 PM, Stefan Priebe - Profihost AG wrote: Am 19.06.2012 18:29, schrieb Dan Mick: The number doesn't change currently (and can't currently be set manually). OK thanks, so for extending the storage this will be pretty important. Do you have any plans for which version this fe

Re: still crashing osds with next branch

2012-06-20 Thread Sage Weil
Just a quick update: there were some problems with doing a rolling upgrade that may be responsible for these. We're testing the fix now. Did this, by chance, happen on a cluster with a mix of 0.47.2 and 0.48? sage On Wed, 20 Jun 2012, Stefan Priebe wrote: > Nobody an idea? Should i open up b

Re: filestore btrfs trans

2012-06-20 Thread Dan Mick
On 06/20/2012 09:09 AM, Sage Weil wrote: On Wed, 20 Jun 2012, Stefan Priebe - Profihost AG wrote: > Hello list, > > i've looked at the wiki (http://ceph.com/wiki/Ceph.conf) but there is no exact > desciption what filestore btrfs trans does. It seems it is false by default. > > Should it b

Re: Possible deadlock condition

2012-06-20 Thread Sage Weil
On Wed, 20 Jun 2012, Mandell Degerness wrote: > The prior thread seems to refer to something fixed in 3.0.X, we are > running 3.2.18. Also, in answer to the previous question, we see the > error on systems running at 82% full and systems running at 5% full > disks. > > Anyone have any ideas about

Re: Possible deadlock condition

2012-06-20 Thread Mandell Degerness
The prior thread seems to refer to something fixed in 3.0.X, we are running 3.2.18. Also, in answer to the previous question, we see the error on systems running at 82% full and systems running at 5% full disks. Anyone have any ideas about how to resolve the deadlock? Do we have to configure mys

Re: ceph osd crush add - uknown command crush

2012-06-20 Thread Gregory Farnum
On Wed, Jun 20, 2012 at 2:53 PM, Travis Rhoden wrote: > This incorrect syntax is still published in the docs at > http://ceph.com/docs/master/ops/manage/crush/#adjusting-crush > > Are the docs that end up on that page in GitHub? I'd be happy to > start making fixes and issuing pull-request. I've

Re: ceph osd crush add - uknown command crush

2012-06-20 Thread Travis Rhoden
This incorrect syntax is still published in the docs at http://ceph.com/docs/master/ops/manage/crush/#adjusting-crush Are the docs that end up on that page in GitHub? I'd be happy to start making fixes and issuing pull-request. I've run into a few gotchas when following the commands on those pag

Re: Should an OSD crash when journal device is out of space?

2012-06-20 Thread Matthew Roy
I hit this a couple times and wondered the same thing. Why does the OSD need to bail when it runs out of journal space? On Wed, Jun 20, 2012 at 3:56 PM, Travis Rhoden wrote: > Not sure if this is a bug or not.  It was definitely user error -- but > since the OSD process bailed, figured I would re

Updated guide for chef installs, from where the docs stop onward

2012-06-20 Thread Tommi Virtanen
So, the docs for Chef install got some doc lovin' lately. It's all at http://ceph.com/docs/master/install/chef/ http://ceph.com/docs/master/config-cluster/chef/ but the docs still stop short from having an actual running Ceph cluster. Also, while the writing was in progress, I managed to lift the

Re: still crashing osds with next branch

2012-06-20 Thread Stefan Priebe
Am 20.06.2012 20:11, schrieb Sage Weil: On Wed, 20 Jun 2012, Stefan Priebe wrote: Am 20.06.2012 19:35, schrieb Sage Weil: On Wed, 20 Jun 2012, Stefan Priebe wrote: Nobody an idea? Should i open up bugs in tracker? Let's open up bugs. If they are reproducible, debug osd = 20 logs would be aw

Should an OSD crash when journal device is out of space?

2012-06-20 Thread Travis Rhoden
Not sure if this is a bug or not. It was definitely user error -- but since the OSD process bailed, figured I would report it. I had /tmpfs mounted with 2.5GB of space: tmpfs on /tmpfs type tmpfs (rw,size=2560m) Then I decided to increase my journal size to 5G, but forgot to increase the limit

Re: how to monitor journal utilization

2012-06-20 Thread Sage Weil
On Wed, 20 Jun 2012, Travis Rhoden wrote: > Is there a way I can monitor the utilization of the OSD journal? I > would like to know if data is coming in fast enough to fill the > journal, and thus slow down all the subsequent writes. > > Once the journal is full, that becomes the bottleneck for w

how to monitor journal utilization

2012-06-20 Thread Travis Rhoden
Is there a way I can monitor the utilization of the OSD journal? I would like to know if data is coming in fast enough to fill the journal, and thus slow down all the subsequent writes. Once the journal is full, that becomes the bottleneck for writes, no? (assuming that writes to the journal are

Re: still crashing osds with next branch

2012-06-20 Thread Sage Weil
On Wed, 20 Jun 2012, Stefan Priebe wrote: > Am 20.06.2012 19:35, schrieb Sage Weil: > > On Wed, 20 Jun 2012, Stefan Priebe wrote: > > > Nobody an idea? Should i open up bugs in tracker? > > > > Let's open up bugs. If they are reproducible, debug osd = 20 logs would > > be awesome! > > > > Also,

Re: still crashing osds with next branch

2012-06-20 Thread Stefan Priebe
Am 20.06.2012 19:35, schrieb Sage Weil: On Wed, 20 Jun 2012, Stefan Priebe wrote: Nobody an idea? Should i open up bugs in tracker? Let's open up bugs. If they are reproducible, debug osd = 20 logs would be awesome! Also, the crash you mentioned in your earlier email we did see: htt

Re: still crashing osds with next branch

2012-06-20 Thread Sage Weil
On Wed, 20 Jun 2012, Stefan Priebe wrote: > Nobody an idea? Should i open up bugs in tracker? Let's open up bugs. If they are reproducible, debug osd = 20 logs would be awesome! Also, the crash you mentioned in your earlier email we did see: http://tracker.newdream.net/issues/2599 If

Re: still crashing osds with next branch

2012-06-20 Thread Stefan Priebe
Nobody an idea? Should i open up bugs in tracker? Am 20.06.2012 15:30, schrieb Stefan Priebe - Profihost AG: Mhm always the same osd's are crashing now again. Mostly while shutting down or restarting a KVM machine. This time: ### Server 1 0> 2012-06-20 11:59:0

Re: problems with ceph.init / etc/hosts and ceph.conf

2012-06-20 Thread Tommi Virtanen
On Wed, Jun 20, 2012 at 9:46 AM, Stefan Priebe wrote: >> Actually, the short hostname of the machine is completely unrelated to >> what IP addresses etc it has. The hostname is a configurable string. >> We treat it as such -- it's just expected to be a unique identifier >> for the host. > Correct.

Re: problems with ceph.init / etc/hosts and ceph.conf

2012-06-20 Thread Stefan Priebe
Am 20.06.2012 um 18:39 schrieb Tommi Virtanen : > On Wed, Jun 20, 2012 at 2:20 AM, Stefan Priebe - Profihost AG > wrote: >>> What does hostname | cut -d . -f 1 say? >> >> It says ssdstor000i as the machine defaults to it's public IP. >> >> Whouldn't it make sense to go through all aliases set

Re: problems with ceph.init / etc/hosts and ceph.conf

2012-06-20 Thread Tommi Virtanen
On Wed, Jun 20, 2012 at 2:20 AM, Stefan Priebe - Profihost AG wrote: >> What does hostname | cut -d . -f 1 say? > > It says ssdstor000i as the machine defaults to it's public IP. > > Whouldn't it make sense to go through all aliases set in /etc/hosts which > point to a local IP? Actually, the sho

Re: filestore btrfs trans

2012-06-20 Thread Sage Weil
On Wed, 20 Jun 2012, Stefan Priebe - Profihost AG wrote: > Hello list, > > i've looked at the wiki (http://ceph.com/wiki/Ceph.conf) but there is no exact > desciption what filestore btrfs trans does. It seems it is false by default. > > Should it be enabled when using btrfs? Nope. This is the o

Re: [PATCH] net/ceph/osd_client.c add sem to osdmap destroy

2012-06-20 Thread Sage Weil
On Wed, 20 Jun 2012, Guan Jun He wrote: > > > >>> On 6/19/2012 at 11:33 PM, in message > , Sage Weil > wrote: > > [Sorry for not responding earlier!] > > > > On Tue, 19 Jun 2012, Guan Jun He wrote: > >> Hi, > >> > >> Do you think this is needed? > >> The osdmap update need to hold this sem

Re: problems with ceph.init / etc/hosts and ceph.conf

2012-06-20 Thread Wido den Hollander
On 06/20/2012 11:20 AM, Stefan Priebe - Profihost AG wrote: Am 20.06.2012 11:13, schrieb Wido den Hollander: On 06/20/2012 10:59 AM, Stefan Priebe - Profihost AG wrote: Hello list, i've some problems to find the correct combination of ceph.conf, /etc/hosts for the init script to work correctly

Re: still crashing osds with next branch

2012-06-20 Thread Stefan Priebe - Profihost AG
Mhm always the same osd's are crashing now again. Mostly while shutting down or restarting a KVM machine. This time: ### Server 1 0> 2012-06-20 11:59:06.499813 7f1664052700 -1 *** Caught signal (Segmentation fault) ** in thread 7f1664052700 ceph version 0.4

still crashing osds with next branch

2012-06-20 Thread Stefan Priebe - Profihost AG
Hello list, i'm still seeing osd crashes with next branch under KVM load. If you need the core dump please tell me. Here are TWO different crashes. Here are the last log lines: ### CRASH 1 ### -3> 2012-06-20 11:59:06.446836 7f1660f4b700 0 osd.13 105 pg[4.64b( v 105'297

Re: [PATCH] net/ceph/osd_client.c add sem to osdmap destroy

2012-06-20 Thread Guan Jun He
>>> On 6/19/2012 at 11:33 PM, in message , Sage Weil wrote: > [Sorry for not responding earlier!] > > On Tue, 19 Jun 2012, Guan Jun He wrote: >> Hi, >> >> Do you think this is needed? >> The osdmap update need to hold this sem.As in function >> void ceph_osdc_handle_map(struct ceph_osd_cl

Re: problems with ceph.init / etc/hosts and ceph.conf

2012-06-20 Thread Stefan Priebe - Profihost AG
Am 20.06.2012 11:20, schrieb Stefan Priebe - Profihost AG: Am 20.06.2012 11:13, schrieb Wido den Hollander: On 06/20/2012 10:59 AM, Stefan Priebe - Profihost AG wrote: Hello list, i've some problems to find the correct combination of ceph.conf, /etc/hosts for the init script to work correctly.

Re: problems with ceph.init / etc/hosts and ceph.conf

2012-06-20 Thread Stefan Priebe - Profihost AG
Am 20.06.2012 11:13, schrieb Wido den Hollander: On 06/20/2012 10:59 AM, Stefan Priebe - Profihost AG wrote: Hello list, i've some problems to find the correct combination of ceph.conf, /etc/hosts for the init script to work correctly. My systems are named: ssdstor000i ssdstor001i ssdstor002i

Re: problems with ceph.init / etc/hosts and ceph.conf

2012-06-20 Thread Wido den Hollander
On 06/20/2012 10:59 AM, Stefan Priebe - Profihost AG wrote: Hello list, i've some problems to find the correct combination of ceph.conf, /etc/hosts for the init script to work correctly. My systems are named: ssdstor000i ssdstor001i ssdstor002i What does hostname | cut -d . -f 1 say? This i

problems with ceph.init / etc/hosts and ceph.conf

2012-06-20 Thread Stefan Priebe - Profihost AG
Hello list, i've some problems to find the correct combination of ceph.conf, /etc/hosts for the init script to work correctly. My systems are named: ssdstor000i ssdstor001i ssdstor002i These names point to the public external IP. These systems also have an alias (ssdstor000,ssdstor001) point