Re: [ceph-users] Ceph PG Incomplete = Cluster unusable

2015-01-09 Thread Andrey Korolyov
On Fri, Jan 9, 2015 at 7:17 AM, Robert LeBlanc wrote: > Protect against bit rot. Checked on read and on deep scrub. There are still issues (at least in firefly) with FDCache and scrub completion having corrupted on-disk data, so throughout checksumming will not cover every possible corruption cas

Re: [ceph-users] Ceph PG Incomplete = Cluster unusable

2015-01-09 Thread Christian Balzer
On Thu, 8 Jan 2015 21:17:12 -0700 Robert LeBlanc wrote: > On Thu, Jan 8, 2015 at 8:31 PM, Christian Balzer wrote: > > On Thu, 8 Jan 2015 11:41:37 -0700 Robert LeBlanc wrote: > > Which of course currently means a strongly consistent lockup in these > > scenarios. ^o^ > > That is one way of puttin

Re: [ceph-users] Ceph PG Incomplete = Cluster unusable

2015-01-09 Thread Robert LeBlanc
On Thu, Jan 8, 2015 at 8:31 PM, Christian Balzer wrote: > On Thu, 8 Jan 2015 11:41:37 -0700 Robert LeBlanc wrote: > Which of course currently means a strongly consistent lockup in these > scenarios. ^o^ That is one way of putting it > Slightly off-topic and snarky, that strong consistency is of

Re: [ceph-users] Ceph PG Incomplete = Cluster unusable

2015-01-08 Thread Christian Balzer
On Thu, 8 Jan 2015 11:41:37 -0700 Robert LeBlanc wrote: > On Wed, Jan 7, 2015 at 10:55 PM, Christian Balzer wrote: > > Which of course begs the question of why not having min_size at 1 > > permanently, so that in the (hopefully rare) case of loosing 2 OSDs at > > the same time your cluster still

Re: [ceph-users] Ceph PG Incomplete = Cluster unusable

2015-01-08 Thread Gregory Farnum
On Wed, Jan 7, 2015 at 9:55 PM, Christian Balzer wrote: > On Wed, 7 Jan 2015 17:07:46 -0800 Craig Lewis wrote: > >> On Mon, Dec 29, 2014 at 4:49 PM, Alexandre Oliva wrote: >> >> > However, I suspect that temporarily setting min size to a lower number >> > could be enough for the PGs to recover.

Re: [ceph-users] Ceph PG Incomplete = Cluster unusable

2015-01-08 Thread Robert LeBlanc
On Wed, Jan 7, 2015 at 10:55 PM, Christian Balzer wrote: > Which of course begs the question of why not having min_size at 1 > permanently, so that in the (hopefully rare) case of loosing 2 OSDs at the > same time your cluster still keeps working (as it should with a size of 3). The idea is that

Re: [ceph-users] Ceph PG Incomplete = Cluster unusable

2015-01-07 Thread Christian Balzer
On Wed, 7 Jan 2015 17:07:46 -0800 Craig Lewis wrote: > On Mon, Dec 29, 2014 at 4:49 PM, Alexandre Oliva wrote: > > > However, I suspect that temporarily setting min size to a lower number > > could be enough for the PGs to recover. If "ceph osd pool set > > min_size 1" doesn't get the PGs goin

Re: [ceph-users] Ceph PG Incomplete = Cluster unusable

2015-01-07 Thread Craig Lewis
On Mon, Dec 29, 2014 at 4:49 PM, Alexandre Oliva wrote: > However, I suspect that temporarily setting min size to a lower number > could be enough for the PGs to recover. If "ceph osd pool set > min_size 1" doesn't get the PGs going, I suppose restarting at least one > of the OSDs involved in t

Re: [ceph-users] Ceph PG Incomplete = Cluster unusable

2014-12-30 Thread Christian Eichelmann
Hi Eneko, nope, new pool has all pgs active+clean, not errors during image creation. The format command just hangs, without error. Am 30.12.2014 12:33, schrieb Eneko Lacunza: > Hi Christian, > > New pool's pgs also show as incomplete? > > Did you notice something remarkable in ceph logs in th

Re: [ceph-users] Ceph PG Incomplete = Cluster unusable

2014-12-30 Thread Eneko Lacunza
Hi Christian, New pool's pgs also show as incomplete? Did you notice something remarkable in ceph logs in the new pools image format? On 30/12/14 12:31, Christian Eichelmann wrote: Hi Eneko, I was trying a rbd cp before, but that was haning as well. But I couldn't find out if the source ima

Re: [ceph-users] Ceph PG Incomplete = Cluster unusable

2014-12-30 Thread Christian Eichelmann
Hi Eneko, I was trying a rbd cp before, but that was haning as well. But I couldn't find out if the source image was causing the hang or the destination image. That's why I decided to try a posix copy. Our cluster is sill nearly empty (12TB / 867TB). But as far as I understood (If not, somebody p

Re: [ceph-users] Ceph PG Incomplete = Cluster unusable

2014-12-30 Thread Eneko Lacunza
Hi Christian, Have you tried to migrate the disk from the old storage (pool) to the new one? I think it should show the same problem, but I think it'd be a much easier path to recover than the posix copy. How full is your storage? Maybe you can customize the crushmap, so that some OSDs are

Re: [ceph-users] Ceph PG Incomplete = Cluster unusable

2014-12-30 Thread Christian Eichelmann
Hi Nico and all others who answered, After some more trying to somehow get the pgs in a working state (I've tried force_create_pg, which was putting then in creating state. But that was obviously not true, since after rebooting one of the containing osd's it went back to incomplete), I decided to

Re: [ceph-users] Ceph PG Incomplete = Cluster unusable

2014-12-29 Thread Alexandre Oliva
On Dec 29, 2014, Christian Eichelmann wrote: > After we got everything up and running again, we still have 3 PGs in the > state incomplete. I was checking one of them directly on the systems > (replication factor is 3). I have run into this myself at least twice before. I had not lost or replac

Re: [ceph-users] Ceph PG Incomplete = Cluster unusable

2014-12-29 Thread Andrey Korolyov
On Mon, Dec 29, 2014 at 12:56 PM, Christian Eichelmann wrote: > Hi all, > > we have a ceph cluster, with currently 360 OSDs in 11 Systems. Last week > we were replacing one OSD System with a new one. During that, we had a > lot of problems with OSDs crashing on all of our systems. But that is > no

Re: [ceph-users] Ceph PG Incomplete = Cluster unusable

2014-12-29 Thread Chad William Seys
Hi Christian, I had a similar problem about a month ago. After trying lots of helpful suggestions, I found none of it worked and I could only delete the affected pools and start over. I opened a feature request in the tracker: http://tracker.ceph.com/issues/10098 If you find a way, let

Re: [ceph-users] Ceph PG Incomplete = Cluster unusable

2014-12-29 Thread Nico Schottelius
Hey Christian, Christian Eichelmann [Mon, Dec 29, 2014 at 10:56:59AM +0100]: > [incomplete PG / RBD hanging, osd lost also not helping] that is very interesting to hear, because we had a similar situation with ceph 0.80.7 and had to re-create a pool, after I deleted 3 pg directories to allow OSDs

[ceph-users] Ceph PG Incomplete = Cluster unusable

2014-12-29 Thread Christian Eichelmann
Hi all, we have a ceph cluster, with currently 360 OSDs in 11 Systems. Last week we were replacing one OSD System with a new one. During that, we had a lot of problems with OSDs crashing on all of our systems. But that is not our current problem. After we got everything up and running again, we s