Re: [ceph-users] cluster failing to recover

2016-07-05 Thread Matyas Koszik
Hi, The disks died, and were removed by: ceph osd out $osd ceph osd lost $osd ceph osd crush remove $osd ceph auth del $osd ceph osd rm $osd When writing my mails it was after the 'lost' or 'crush remove' step, not sure. But even the last step didn't fix the issue. It was like this: http://paste

Re: [ceph-users] cluster failing to recover

2016-07-05 Thread Sean Redmond
Hi, What happened to the missing 2 OSD's? 53 osds: 51 up, 51 in Thanks On Tue, Jul 5, 2016 at 4:04 PM, Matyas Koszik wrote: > > Should you be interested, the solution to this was > ceph pg $pg mark_unfound_lost delete > for all pgs that had unfound objects, now the cluster is back in a health

Re: [ceph-users] cluster failing to recover

2016-07-05 Thread Matyas Koszik
Should you be interested, the solution to this was ceph pg $pg mark_unfound_lost delete for all pgs that had unfound objects, now the cluster is back in a healthy state. I think this is very counter-intuitive (why should totally unrelated pgs be affected by this?!) but at least the solution was s

Re: [ceph-users] cluster failing to recover

2016-07-03 Thread Oliver Dzombic
Hi, did you already do something ( replacing drives or changing something ) ? You have 11 scrub errors, and ~ 11x inconsistent pg's The inconsistent pg's, for example: pg 4.3a7 is stuck unclean for 629.766502, current state active+recovery_wait+degraded+inconsistent, last acting [10,21] are no

Re: [ceph-users] cluster failing to recover

2016-07-03 Thread Oliver Dzombic
Hi, please provide: ceph health detail ceph osd tree -- Mit freundlichen Gruessen / Best regards Oliver Dzombic IP-Interactive mailto:i...@ip-interactive.de Anschrift: IP Interactive UG ( haftungsbeschraenkt ) Zum Sonnenberg 1-3 63571 Gelnhausen HRB 93402 beim Amtsgericht Hanau Geschäftsf

[ceph-users] cluster failing to recover

2016-07-03 Thread Matyas Koszik
Hi, I recently upgraded to jewel (10.2.2) and now I'm confronted with a rather strange behavior: recovey does not progress in the way it should. If I restart the osds on a host, it'll get a bit better (or worse), like this: 50 pgs undersized recovery 43775/7057285 objects degraded (0.620%) recov