Re: [ceph-users] best way to resolve 'stale+active+clean' after disk failure

2017-04-07 Thread David Welch
Thanks for the suggestions. There turned out to be an old testing pool with replication of 1 that was causing the issue. Removing the pool fixed the issue. On 04/06/2017 07:34 PM, Brad Hubbard wrote: What are size and min_size for pool '7'... and why? On Fri, Apr 7, 2017 at 4:20 AM, David We

Re: [ceph-users] best way to resolve 'stale+active+clean' after disk failure

2017-04-06 Thread Brad Hubbard
What are size and min_size for pool '7'... and why? On Fri, Apr 7, 2017 at 4:20 AM, David Welch wrote: > Hi, > We had a disk on the cluster that was not responding properly and causing > 'slow requests'. The osd on the disk was stopped and the osd was marked down > and then out. Rebalancing succe

Re: [ceph-users] best way to resolve 'stale+active+clean' after disk failure

2017-04-06 Thread Wido den Hollander
> Op 7 april 2017 om 1:04 schreef Ben Hines : > > > Personally before extreme measures like marking lost, i would try bringing > up the osd, so it's up and out -- i believe the data will still be found > and re balanced away from it by Ceph. Indeed, do not mark it as lost yet. Start the OSD (1

Re: [ceph-users] best way to resolve 'stale+active+clean' after disk failure

2017-04-06 Thread Ben Hines
Personally before extreme measures like marking lost, i would try bringing up the osd, so it's up and out -- i believe the data will still be found and re balanced away from it by Ceph. -Ben On Thu, Apr 6, 2017 at 11:20 AM, David Welch wrote: > Hi, > We had a disk on the cluster that was not r

[ceph-users] best way to resolve 'stale+active+clean' after disk failure

2017-04-06 Thread David Welch
Hi, We had a disk on the cluster that was not responding properly and causing 'slow requests'. The osd on the disk was stopped and the osd was marked down and then out. Rebalancing succeeded but (some?) pgs from that osd are now stuck in stale+active+clean state, which is not being resolved (s