Re: [ceph-users] Cluster not recovering after OSD deamon is down

2016-05-03 Thread Gaurav Bafna
The replication size is 3 and min_size is 2. Yes , they don't have enough copies. Ceph by itself should recover from this state to ensure durability . @Tupper : In this bug, each node is hosting only three osds . In my set up , every node has 23 osds. So this should not be the issue . On Tue,

Re: [ceph-users] Cluster not recovering after OSD deamon is down

2016-05-03 Thread Varada Kari
Pgs are degraded because they don't have enough copies of the data. What is your replication size? You can refer to http://docs.ceph.com/docs/master/rados/operations/pg-states/ for PG states. Varada On Tuesday 03 May 2016 06:56 PM, Gaurav Bafna wrote: > Also , the old PGs are not mapped to the

Re: [ceph-users] Cluster not recovering after OSD deamon is down

2016-05-03 Thread Gaurav Bafna
Also , the old PGs are not mapped to the down osd as seen from the ceph health detail pg 5.72 is active+undersized+degraded, acting [16,49] pg 5.4e is active+undersized+degraded, acting [16,38] pg 5.32 is active+undersized+degraded, acting [39,19] pg 5.37 is active+undersized+degraded, acting

Re: [ceph-users] Cluster not recovering after OSD deamon is down

2016-05-03 Thread Tupper Cole
Yes the pg *should *get remapped, but that is not always the case. For discussion on thi, check out the tracker below. Your particular circumstances may be a little different, but the idea is the same. http://tracker.ceph.com/issues/3806 On Tue, May 3, 2016 at 9:16 AM, Gaurav Bafna

Re: [ceph-users] Cluster not recovering after OSD deamon is down

2016-05-03 Thread Gaurav Bafna
Thanks Tupper for replying. Shouldn't the PG be remapped to other OSDs ? Yes , removing OSD from the cluster is resulting into full recovery. But that should not be needed , right ? On Tue, May 3, 2016 at 6:31 PM, Tupper Cole wrote: > The degraded pgs are mapped to the down

Re: [ceph-users] Cluster not recovering after OSD deamon is down

2016-05-03 Thread Tupper Cole
The degraded pgs are mapped to the down OSD and have not mapped to a new OSD. Removing the OSD would likely result in a full recovery. As a note, having two monitors (or any even number of monitors) is not recommended. If either monitor goes down you will lose quorum. The recommended number of

[ceph-users] Cluster not recovering after OSD deamon is down

2016-05-03 Thread Gaurav Bafna
Hi Cephers, I am running a very small cluster of 3 storage and 2 monitor nodes. After I kill 1 osd-daemon, the cluster never recovers fully. 9 PGs remain undersized for unknown reason. After I restart that 1 osd deamon, the cluster recovers in no time . Size of all pools are 3 and min_size is

[ceph-users] Cluster not recovering after OSD deamon is down

2016-05-03 Thread Gaurav Bafna
Hi Cephers, I am running a very small cluster of 3 storage and 2 monitor nodes. After I kill 1 osd-daemon, the cluster never recovers fully. 9 PGs remain undersized for unknown reason. After I restart that 1 osd deamon, the cluster recovers in no time . Size of all pools are 3 and min_size is