Re: Is Ceph recovery able to handle massive crash

2013-01-08 Thread Denis Fondras
Hello, I tried to upgrade to 0.56.1 this morning as it could help with recovery. No luck so far... What's wrong with your primary OSD? I don't know what's really wrong. The disk seems fine. In general they shouldn't really be crashing that frequently and if you've got a new bug we'd

Rados gateway init timeout with cache

2013-01-08 Thread Yann ROBIN
Hi, We recently experienced issue with the backplane of our server, resulting in loosing half of our osd. During that period the rados gateway failed initializing (timeout). We found that the gateway was hanging in the init_watch function. We recreate our OSDs and we still have this issue, but

Re: Is Ceph recovery able to handle massive crash

2013-01-08 Thread Denis Fondras
Hello, I'm wondering if I can get every rb.0.8e10.3e2219d7.* from the OSD drive and cat them together and get back a usable raw volume from which I could get back my data ? Everything seems to be there but I don't know the order of the rbd objects. Are the last bytes of the file name the

Re: Is Ceph recovery able to handle massive crash

2013-01-08 Thread Wido den Hollander
On 01/08/2013 01:57 PM, Denis Fondras wrote: Hello, I'm wondering if I can get every rb.0.8e10.3e2219d7.* from the OSD drive and cat them together and get back a usable raw volume from which I could get back my data ? Yes, that is doable. The problem only is that RBD is sparse. So you'd

Re: Is Ceph recovery able to handle massive crash

2013-01-08 Thread Wido den Hollander
On 01/08/2013 02:10 PM, Wido den Hollander wrote: On 01/08/2013 01:57 PM, Denis Fondras wrote: Hello, I'm wondering if I can get every rb.0.8e10.3e2219d7.* from the OSD drive and cat them together and get back a usable raw volume from which I could get back my data ? Yes, that is doable.

RE: Rados gateway init timeout with cache

2013-01-08 Thread Yann ROBIN
Notify and gc objects where unfound, we marked them as lost and now the rados start. But this means that if some notify object are not fully available, the radosgateway stop responding. -Original Message- From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-ow...@vger.kernel.org]

RE: Is Ceph recovery able to handle massive crash

2013-01-08 Thread Moore, Shawn M
If you know the prefix (which is seems you do) and the original size of the rbd you should be able to use my utility. https://github.com/smmoore/ceph/blob/master/rbd_restore.sh You will need all the rados files in the current working directory you execute the script from. We have used it many

Re: OSD Crashed when runing rbd list

2013-01-08 Thread James Page
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 On 08/01/13 15:51, Chen, Xiaoxi wrote: I would like to upgrade to 0.56-1 but there is no package for 3.7 kernel(raring) I uploaded 0.56.1 to Ubuntu Raring this morning - its published and should ripple through archive mirrors in the next few

Re: what could go wrong with two clusters on the same network?

2013-01-08 Thread Gregory Farnum
On Mon, Dec 31, 2012 at 10:27 AM, Wido den Hollander w...@widodh.nl wrote: Just make sure you use cephx (enabled by default in 0.55) so that you don't accidentally connect to the wrong cluster. Use of cephx will provide an additional layer of protection for the clients, but the OSDs and

Re: hit suicide timeout message after upgrade to 0.56

2013-01-08 Thread Gregory Farnum
I'm confused. Isn't the HeartbeatMap all about local thread heartbeating (so, not pings with other OSDs)? I would assume the upgrade and restart just caused a bunch of work and the CPUs got overloaded. -Greg On Thu, Jan 3, 2013 at 8:52 AM, Sage Weil s...@inktank.com wrote: Hi Wido, On Thu, 3

Re: Windows port

2013-01-08 Thread Nick Couchman
On 2013/01/08 at 10:08, Gregory Farnum g...@inktank.com wrote: On Mon, Jan 7, 2013 at 9:36 PM, Cesar Mello cme...@gmail.com wrote: Hi, I have been playing with ceph and reading the docs/thesis the last couple of nights just to learn something during my vacation. I was not expecting to find

branches

2013-01-08 Thread Sage Weil
I'd like to adjust the branches we're maintaining in ceph.git. Currently: master - active development next - frozen for next release - bug fixes only - regularly merged back into master testing - last development release - cherry-pick -x'd critical fixes - packages at

Re: branches

2013-01-08 Thread Mark Nelson
On 01/08/2013 12:39 PM, Sage Weil wrote: I'd like to adjust the branches we're maintaining in ceph.git. Currently: master - active development next - frozen for next release - bug fixes only - regularly merged back into master testing - last development release - cherry-pick -x'd

Re: Is Ceph recovery able to handle massive crash

2013-01-08 Thread Denis Fondras
Hello, What error message do you get when you try and turn it on? If the daemon is crashing, what is the backtrace? The daemon is crashing. Here is the full log if you want to take a look : http://vps.ledeuns.net/ceph-osd.0.log.gz The RBD rebuild script helped to get the data back. I will

Crushmap Design Question

2013-01-08 Thread Moore, Shawn M
I have been testing ceph for a little over a month now. Our design goal is to have 3 datacenters in different buildings all tied together over 10GbE. Currently there are 10 servers each serving 1 osd in 2 of the datacenters. In the third is one large server with 16 SAS disks serving 8 osds.

RE: Rados gateway init timeout with cache

2013-01-08 Thread Yann ROBIN
We lost data in notify and gc. What bother me is that the rados gateway can start if we desactivate the cache. I think the availability of the cache objects shouldn't take down the rados gateway. The option should be more a I want the cache if available. -Message d'origine- De : Gregory

Re: Rados gateway init timeout with cache

2013-01-08 Thread Gregory Farnum
On Tue, Jan 8, 2013 at 1:11 PM, Yann ROBIN yann.ro...@youscribe.com wrote: We lost data in notify and gc. What bother me is that the rados gateway can start if we desactivate the cache. I think the availability of the cache objects shouldn't take down the rados gateway. The option should be

Re: hit suicide timeout message after upgrade to 0.56

2013-01-08 Thread Sage Weil
On Tue, 8 Jan 2013, Gregory Farnum wrote: I'm confused. Isn't the HeartbeatMap all about local thread heartbeating (so, not pings with other OSDs)? I would assume the upgrade and restart just caused a bunch of work and the CPUs got overloaded. It is. In #3714's case, the OSD was down for a

Adjusting replicas on argonaut

2013-01-08 Thread Bryan Stillwell
I tried increasing the number of metadata replicas from 2 to 3 on my test cluster with the following command: ceph osd pool set metadata size 3 Afterwards it appears that all the metadata placement groups switch to a degraded state and doesn't seem to be attempting to recover: 2013-01-08

Re: Adjusting replicas on argonaut

2013-01-08 Thread Gregory Farnum
What are your CRUSH rules? Depending on how you set this cluster up, it might not be placing more than one replica in a single host, and you've only got two hosts so it couldn't satisfy your request for 3 copies. -Greg On Tue, Jan 8, 2013 at 2:11 PM, Bryan Stillwell bstillw...@photobucket.com

Re: Adjusting replicas on argonaut

2013-01-08 Thread Bryan Stillwell
That would make sense. Here's what the metadata rule looks like: rule metadata { ruleset 1 type replicated min_size 2 max_size 10 step take default step chooseleaf firstn 0 type host step emit } On Tue, Jan 8, 2013 at 3:23 PM, Gregory

Re: Adjusting replicas on argonaut

2013-01-08 Thread Gregory Farnum
Yep! The step chooseleaf firstn 0 type host means choose n nodes of type host, and select a leaf under each one of them, where n is the pool size. You only have two hosts so it can't do more than 2 with that rule type. You could do step chooseleaf firstn 0 type device, but that won't guarantee a

Re: Adjusting replicas on argonaut

2013-01-08 Thread Bryan Stillwell
I appreciate you giving more detail on this. I plan on expanding the test cluster to 5 servers soon, so I'll just wait until then before changing the number of replicas. Thanks, Bryan On Tue, Jan 8, 2013 at 3:49 PM, Gregory Farnum g...@inktank.com wrote: Yep! The step chooseleaf firstn 0 type

Re: Is Ceph recovery able to handle massive crash

2013-01-08 Thread Gregory Farnum
On Tue, Jan 8, 2013 at 11:44 AM, Denis Fondras c...@ledeuns.net wrote: Hello, What error message do you get when you try and turn it on? If the daemon is crashing, what is the backtrace? The daemon is crashing. Here is the full log if you want to take a look :

RE: Crushmap Design Question

2013-01-08 Thread Chen, Xiaoxi
Hi, Setting rep size to 3 only make the data triple-replication, that means when you fail all OSDs in 2 out of 3 DCs, the data still accessable. But Monitor is another story, for monitor clusters with 2N+1 nodes, it require at least N+1 nodes alive, and indeed this is why you

v0.48.3 argonaut update released

2013-01-08 Thread Sage Weil
After several months, we have an important update for the argonaut v0.48.x series. This release contains a critical fix that can prevent data loss or corruption in a power loss or kernel panic situation. There are also several fixes for the OSDs and for the radosgw. We recommend all v0.48.x

Re: [PATCH 0/2] Librados aio stat

2013-01-08 Thread Sage Weil
On Mon, 7 Jan 2013, Filippos Giannakos wrote: Hi Josh, On 01/05/2013 02:08 AM, Josh Durgin wrote: On 01/04/2013 05:01 AM, Filippos Giannakos wrote: Hi Team, Is there any progress or any comments regarding the librados aio stat patch ? They look good to me. I put them in the

Re: recoverying from 95% full osd

2013-01-08 Thread Roman Hlynovskiy
Hello Mark, ok, adding another osd is a good option, however my initial plan was to raise full ratio watermark and remove unnecessary data. it' clear for me that overfilling one of osd will cause big problems to the fs consistency. But... 2 other OSDs are still having plenty of space. what is the

Re: recoverying from 95% full osd

2013-01-08 Thread Roman Hlynovskiy
Thanks a lot Greg, that was the black magic command I was looking for ) I deleted some obsolete data and reached those figures: chef@cephgw:~$ ./clu.sh exec df -kh|grep osd /dev/mapper/vg00-osd 252G 153G 100G 61% /var/lib/ceph/osd/ceph-0 /dev/mapper/vg00-osd 252G 180G 73G 72%

Re: recoverying from 95% full osd

2013-01-08 Thread Sage Weil
On Wed, 9 Jan 2013, Roman Hlynovskiy wrote: Thanks a lot Greg, that was the black magic command I was looking for ) I deleted some obsolete data and reached those figures: chef@cephgw:~$ ./clu.sh exec df -kh|grep osd /dev/mapper/vg00-osd 252G 153G 100G 61% /var/lib/ceph/osd/ceph-0

Re: recoverying from 95% full osd

2013-01-08 Thread Gregory Farnum
On Tuesday, January 8, 2013 at 10:52 PM, Sage Weil wrote: On Wed, 9 Jan 2013, Roman Hlynovskiy wrote: Thanks a lot Greg, that was the black magic command I was looking for ) I deleted some obsolete data and reached those figures: chef@cephgw:~$ ./clu.sh (http://clu.sh) exec df