Re: RADOS + deep scrubbing performance issues in production environment

2015-07-10 Thread icq2206241
All IO drops to ZERO IOPS for 1-15 minutes during the deep-scrub on my cluster. There is clearly a locking bug! I have VMs - every day, several times, sometime on all of them disk IO _completely_ stops. Disk queue is growing, 0 IOPS are performed, services are dying with timeouts... At the sam

Re: RADOS + deep scrubbing performance issues in production environment

2014-01-28 Thread Filippos Giannakos
On Tue, Jan 28, 2014 at 01:30:46AM -0500, Mike Dawson wrote: > > On 1/27/2014 1:45 PM, Sage Weil wrote: > >There is also > > > > ceph osd set noscrub > > > >and then later > > > > ceph osd unset noscrub > > > In my experience scrub isn't nearly as much of a problem as > deep-scrub. On a IOPS con

Re: RADOS + deep scrubbing performance issues in production environment

2014-01-28 Thread Filippos Giannakos
On Mon, Jan 27, 2014 at 10:45:48AM -0800, Sage Weil wrote: > There is also > > ceph osd set noscrub > > and then later > > ceph osd unset noscrub > > I forget whether this pauses an in-progress PG scrub or just makes it stop > when it gets to the next PG boundary. > > sage I bumped into t

Re: RADOS + deep scrubbing performance issues in production environment

2014-01-28 Thread Filippos Giannakos
On Mon, Jan 27, 2014 at 01:10:23PM -0500, Kyle Bader wrote: > > Are there any tools we are not aware of for controlling, possibly pausing, > > deep-scrub and/or getting some progress about the procedure ? > > Also since I believe it would be a bad practice to disable deep-scrubbing > > do you > >

Re: RADOS + deep scrubbing performance issues in production environment

2014-01-27 Thread Mike Dawson
On 1/27/2014 1:45 PM, Sage Weil wrote: There is also ceph osd set noscrub and then later ceph osd unset noscrub In my experience scrub isn't nearly as much of a problem as deep-scrub. On a IOPS constrained cluster with writes approaching the available aggregate spindle performance minu

Re: RADOS + deep scrubbing performance issues in production environment

2014-01-27 Thread Sage Weil
There is also ceph osd set noscrub and then later ceph osd unset noscrub I forget whether this pauses an in-progress PG scrub or just makes it stop when it gets to the next PG boundary. sage On Mon, 27 Jan 2014, Kyle Bader wrote: > > Are there any tools we are not aware of for controllin

Re: RADOS + deep scrubbing performance issues in production environment

2014-01-27 Thread Kyle Bader
> Are there any tools we are not aware of for controlling, possibly pausing, > deep-scrub and/or getting some progress about the procedure ? > Also since I believe it would be a bad practice to disable deep-scrubbing do > you > have any recommendations of how to work around (or even solve) this is

RADOS + deep scrubbing performance issues in production environment

2014-01-27 Thread Filippos Giannakos
Hello all, We have been running RADOS in a large scale, production, public cloud environment for a few months now and we are generally happy with it. However, we experience performance problems when deep scrubbing is active. We managed to reproduce them in our testing cluster running emperor, ev