Re: recoverying from 95% full osd

2013-01-08 Thread Gregory Farnum
On Tuesday, January 8, 2013 at 10:52 PM, Sage Weil wrote: > On Wed, 9 Jan 2013, Roman Hlynovskiy wrote: > > Thanks a lot Greg, > > > > that was the black magic command I was looking for ) > > > > I deleted some obsolete data and reached those figures: > > > > chef@cephgw:~$ ./clu.sh (http://clu.

Re: recoverying from 95% full osd

2013-01-08 Thread Sage Weil
On Wed, 9 Jan 2013, Roman Hlynovskiy wrote: > Thanks a lot Greg, > > that was the black magic command I was looking for ) > > I deleted some obsolete data and reached those figures: > > chef@cephgw:~$ ./clu.sh exec "df -kh"|grep osd > /dev/mapper/vg00-osd 252G 153G 100G 61% /var/lib/ceph/osd

Re: recoverying from 95% full osd

2013-01-08 Thread Roman Hlynovskiy
Thanks a lot Greg, that was the black magic command I was looking for ) I deleted some obsolete data and reached those figures: chef@cephgw:~$ ./clu.sh exec "df -kh"|grep osd /dev/mapper/vg00-osd 252G 153G 100G 61% /var/lib/ceph/osd/ceph-0 /dev/mapper/vg00-osd 252G 180G 73G 72% /var/lib

Re: recoverying from 95% full osd

2013-01-08 Thread Roman Hlynovskiy
Hello Mark, ok, adding another osd is a good option, however my initial plan was to raise full ratio watermark and remove unnecessary data. it' clear for me that overfilling one of osd will cause big problems to the fs consistency. But... 2 other OSDs are still having plenty of space. what is the

Re: [PATCH 0/2] Librados aio stat

2013-01-08 Thread Sage Weil
On Mon, 7 Jan 2013, Filippos Giannakos wrote: > Hi Josh, > > On 01/05/2013 02:08 AM, Josh Durgin wrote: > > On 01/04/2013 05:01 AM, Filippos Giannakos wrote: > > > Hi Team, > > > > > > Is there any progress or any comments regarding the librados aio stat > > > patch ? > > > > They look good to m

v0.48.3 argonaut update released

2013-01-08 Thread Sage Weil
After several months, we have an important update for the argonaut v0.48.x series. This release contains a critical fix that can prevent data loss or corruption in a power loss or kernel panic situation. There are also several fixes for the OSDs and for the radosgw. We recommend all v0.48.x use

RE: Crushmap Design Question

2013-01-08 Thread Chen, Xiaoxi
Hi, Setting rep size to 3 only make the data triple-replication, that means when you "fail" all OSDs in 2 out of 3 DCs, the data still accessable. But Monitor is another story, for monitor clusters with 2N+1 nodes, it require at least N+1 nodes alive, and indeed this is why you Ce

Re: Is Ceph recovery able to handle massive crash

2013-01-08 Thread Gregory Farnum
On Tue, Jan 8, 2013 at 11:44 AM, Denis Fondras wrote: > Hello, > > >> What error message do you get when you try and turn it on? If the >> daemon is crashing, what is the backtrace? > > > The daemon is crashing. Here is the full log if you want to take a look : > http://vps.ledeuns.net/ceph-osd.0.

Re: Adjusting replicas on argonaut

2013-01-08 Thread Bryan Stillwell
I appreciate you giving more detail on this. I plan on expanding the test cluster to 5 servers soon, so I'll just wait until then before changing the number of replicas. Thanks, Bryan On Tue, Jan 8, 2013 at 3:49 PM, Gregory Farnum wrote: > Yep! The "step chooseleaf firstn 0 type host" means "ch

Re: Adjusting replicas on argonaut

2013-01-08 Thread Gregory Farnum
Yep! The "step chooseleaf firstn 0 type host" means "choose n nodes of type host, and select a leaf under each one of them", where n is the pool size. You only have two hosts so it can't do more than 2 with that rule type. You could do "step chooseleaf firstn 0 type device", but that won't guarante

Re: Adjusting replicas on argonaut

2013-01-08 Thread Bryan Stillwell
That would make sense. Here's what the metadata rule looks like: rule metadata { ruleset 1 type replicated min_size 2 max_size 10 step take default step chooseleaf firstn 0 type host step emit } On Tue, Jan 8, 2013 at 3:23 PM, Gregory Farnu

Re: Adjusting replicas on argonaut

2013-01-08 Thread Gregory Farnum
What are your CRUSH rules? Depending on how you set this cluster up, it might not be placing more than one replica in a single host, and you've only got two hosts so it couldn't satisfy your request for 3 copies. -Greg On Tue, Jan 8, 2013 at 2:11 PM, Bryan Stillwell wrote: > I tried increasing th

Adjusting replicas on argonaut

2013-01-08 Thread Bryan Stillwell
I tried increasing the number of metadata replicas from 2 to 3 on my test cluster with the following command: ceph osd pool set metadata size 3 Afterwards it appears that all the metadata placement groups switch to a degraded state and doesn't seem to be attempting to recover: 2013-01-08 14:49:

Re: "hit suicide timeout" message after upgrade to 0.56

2013-01-08 Thread Sage Weil
On Tue, 8 Jan 2013, Gregory Farnum wrote: > I'm confused. Isn't the HeartbeatMap all about local thread > heartbeating (so, not pings with other OSDs)? I would assume the > upgrade and restart just caused a bunch of work and the CPUs got > overloaded. It is. In #3714's case, the OSD was down for

Re: Rados gateway init timeout with cache

2013-01-08 Thread Gregory Farnum
On Tue, Jan 8, 2013 at 1:11 PM, Yann ROBIN wrote: > We lost data in notify and gc. What bother me is that the rados gateway can > start if we desactivate the cache. > I think the availability of the cache objects shouldn't take down the rados > gateway. The option should be more a "I want the ca

RE: Rados gateway init timeout with cache

2013-01-08 Thread Yann ROBIN
We lost data in notify and gc. What bother me is that the rados gateway can start if we desactivate the cache. I think the availability of the cache objects shouldn't take down the rados gateway. The option should be more a "I want the cache if available". -Message d'origine- De : Gregor

Crushmap Design Question

2013-01-08 Thread Moore, Shawn M
I have been testing ceph for a little over a month now. Our design goal is to have 3 datacenters in different buildings all tied together over 10GbE. Currently there are 10 servers each serving 1 osd in 2 of the datacenters. In the third is one large server with 16 SAS disks serving 8 osds.

Re: Is Ceph recovery able to handle massive crash

2013-01-08 Thread Denis Fondras
Hello, What error message do you get when you try and turn it on? If the daemon is crashing, what is the backtrace? The daemon is crashing. Here is the full log if you want to take a look : http://vps.ledeuns.net/ceph-osd.0.log.gz The RBD rebuild script helped to get the data back. I will n

Re: branches

2013-01-08 Thread Mark Nelson
On 01/08/2013 12:39 PM, Sage Weil wrote: I'd like to adjust the branches we're maintaining in ceph.git. Currently: master - active development next - frozen for next release - bug fixes only - regularly merged back into master testing - last development release - cherry-pick -x'd cr

branches

2013-01-08 Thread Sage Weil
I'd like to adjust the branches we're maintaining in ceph.git. Currently: master - active development next - frozen for next release - bug fixes only - regularly merged back into master testing - last development release - cherry-pick -x'd critical fixes - packages at ceph.com/debian-testi

Re: Windows port

2013-01-08 Thread Nick Couchman
>>> On 2013/01/08 at 10:08, Gregory Farnum wrote: > On Mon, Jan 7, 2013 at 9:36 PM, Cesar Mello wrote: >> Hi, >> >> I have been playing with ceph and reading the docs/thesis the last >> couple of nights just to learn something during my vacation. I was not >> expecting to find such an awesome an

Re: "hit suicide timeout" message after upgrade to 0.56

2013-01-08 Thread Gregory Farnum
I'm confused. Isn't the HeartbeatMap all about local thread heartbeating (so, not pings with other OSDs)? I would assume the upgrade and restart just caused a bunch of work and the CPUs got overloaded. -Greg On Thu, Jan 3, 2013 at 8:52 AM, Sage Weil wrote: > Hi Wido, > > On Thu, 3 Jan 2013, Wido

Re: what could go wrong with two clusters on the same network?

2013-01-08 Thread Gregory Farnum
On Mon, Dec 31, 2012 at 10:27 AM, Wido den Hollander wrote: > Just make sure you use cephx (enabled by default in 0.55) so that you don't > accidentally connect to the wrong cluster. Use of cephx will provide an additional layer of protection for the clients, but the OSDs and monitors (the only o

Re: recoverying from 95% full osd

2013-01-08 Thread Gregory Farnum
On Tue, Jan 8, 2013 at 2:42 AM, Roman Hlynovskiy wrote: > Hello, > > I am running ceph v0.56 and at the moment trying to recover ceph which > got completely stuck after 1 osd got filled by 95%. Looks like the > distribution algorithm is not perfect since all 3 OSD's I user are > 256Gb each, howeve

Re: OSD Crashed when runing "rbd list"

2013-01-08 Thread Gregory Farnum
On Tue, Jan 8, 2013 at 7:51 AM, Chen, Xiaoxi wrote: > Hi List, > Every time I ran "rbd list" after creating a lot of rbd volumes (more > than 100s), certain OSDs will die,osd.65 die first and then osd.35 > (osd.65,that's the fifth disk on the sixth host) will die. > Is it a bug for 0.55?

Re: ceph caps (Ganesha + Ceph pnfs)

2013-01-08 Thread Matt W. Benjamin
Hi Sage, - "Sage Weil" wrote: > > Your prevoius question made it sound like the DS was interacting with > > libcephfs and dealing with (some) MDS capabilities. Is that right? > > I wonder if a much simpler approach would be to make a different fh > format > or type, and just cram the in

Re: Is Ceph recovery able to handle massive crash

2013-01-08 Thread Gregory Farnum
On Tue, Jan 8, 2013 at 12:44 AM, Denis Fondras wrote: >> What's wrong with your primary OSD? > > > I don't know what's really wrong. The disk seems fine. What error message do you get when you try and turn it on? If the daemon is crashing, what is the backtrace? -Greg -- To unsubscribe from this

Re: Windows port

2013-01-08 Thread Gregory Farnum
On Mon, Jan 7, 2013 at 9:36 PM, Cesar Mello wrote: > Hi, > > I have been playing with ceph and reading the docs/thesis the last > couple of nights just to learn something during my vacation. I was not > expecting to find such an awesome and state of the art project. > Congratulations for the great

Re: Rados gateway init timeout with cache

2013-01-08 Thread Gregory Farnum
To clarify, you lost the data on half of your OSDs? And it sounds like they weren't in separate CRUSH failure domains? Given that, yep, you've lost some data. :( On Tue, Jan 8, 2013 at 5:41 AM, Yann ROBIN wrote: > Notify and gc objects where unfound, we marked them as lost and now the rados > s

Re: recoverying from 95% full osd

2013-01-08 Thread Mark Nelson
On 01/08/2013 04:42 AM, Roman Hlynovskiy wrote: Hello, I am running ceph v0.56 and at the moment trying to recover ceph which got completely stuck after 1 osd got filled by 95%. Looks like the distribution algorithm is not perfect since all 3 OSD's I user are 256Gb each, however one of them got

Re: OSD Crashed when runing "rbd list"

2013-01-08 Thread James Page
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 On 08/01/13 15:51, Chen, Xiaoxi wrote: > I would like to upgrade to 0.56-1 but there is no package for 3.7 > kernel(raring) I uploaded 0.56.1 to Ubuntu Raring this morning - its published and should ripple through archive mirrors in the next few hou

OSD Crashed when runing "rbd list"

2013-01-08 Thread Chen, Xiaoxi
Hi List, Every time I ran "rbd list" after creating a lot of rbd volumes (more than 100s), certain OSDs will die,osd.65 die first and then osd.35 (osd.65,that's the fifth disk on the sixth host) will die. Is it a bug for 0.55? My ceph version is 0.55-1 with 3.7 kernel. I would lik

Re: Is Ceph recovery able to handle massive crash

2013-01-08 Thread Denis Fondras
Le 08/01/2013 14:51, Moore, Shawn M a écrit : If you know the prefix (which is seems you do) and the original size of the rbd you should be able to use my utility. https://github.com/smmoore/ceph/blob/master/rbd_restore.sh You will need all the rados files in the current working directory you

Re: Windows port

2013-01-08 Thread Dino Yancey
Hi, I am also curious if a Windows port, specifically the client-side, is on the roadmap. Dino On Mon, Jan 7, 2013 at 11:36 PM, Cesar Mello wrote: > Hi, > > I have been playing with ceph and reading the docs/thesis the last > couple of nights just to learn something during my vacation. I was no

RE: Is Ceph recovery able to handle massive crash

2013-01-08 Thread Moore, Shawn M
If you know the prefix (which is seems you do) and the original size of the rbd you should be able to use my utility. https://github.com/smmoore/ceph/blob/master/rbd_restore.sh You will need all the rados files in the current working directory you execute the script from. We have used it many

RE: Rados gateway init timeout with cache

2013-01-08 Thread Yann ROBIN
Notify and gc objects where unfound, we marked them as lost and now the rados start. But this means that if some notify object are not fully available, the radosgateway stop responding. -Original Message- From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-ow...@vger.kernel.org] O

Re: Is Ceph recovery able to handle massive crash

2013-01-08 Thread Wido den Hollander
On 01/08/2013 02:10 PM, Wido den Hollander wrote: On 01/08/2013 01:57 PM, Denis Fondras wrote: Hello, I'm wondering if I can get every "rb.0.8e10.3e2219d7.*" from the OSD drive and cat them together and get back a usable raw volume from which I could get back my data ? Yes, that is doable. T

Re: Is Ceph recovery able to handle massive crash

2013-01-08 Thread Wido den Hollander
On 01/08/2013 01:57 PM, Denis Fondras wrote: Hello, I'm wondering if I can get every "rb.0.8e10.3e2219d7.*" from the OSD drive and cat them together and get back a usable raw volume from which I could get back my data ? Yes, that is doable. The problem only is that RBD is sparse. So you'd ha

Re: Is Ceph recovery able to handle massive crash

2013-01-08 Thread Denis Fondras
Hello, I'm wondering if I can get every "rb.0.8e10.3e2219d7.*" from the OSD drive and cat them together and get back a usable raw volume from which I could get back my data ? Everything seems to be there but I don't know the order of the rbd objects. Are the last bytes of the file name the o

Rados gateway init timeout with cache

2013-01-08 Thread Yann ROBIN
Hi, We recently experienced issue with the backplane of our server, resulting in loosing half of our osd. During that period the rados gateway failed initializing (timeout). We found that the gateway was hanging in the init_watch function. We recreate our OSDs and we still have this issue, but p

Re: Is Ceph recovery able to handle massive crash

2013-01-08 Thread Denis Fondras
Hello, I tried to upgrade to 0.56.1 this morning as it could help with recovery. No luck so far... What's wrong with your primary OSD? I don't know what's really wrong. The disk seems fine. In general they shouldn't really be crashing that frequently and if you've got a new bug we'd like