Re: [ceph-users] Jewel (10.2.7) osd suicide timeout while deep-scrub

2017-09-05 Thread Tyler Bishop
.ceph.com> Sent: Tuesday, September 5, 2017 1:17:32 AM Subject: Re: [ceph-users] Jewel (10.2.7) osd suicide timeout while deep-scrub Hi! Thanks for the pointer about leveldb_compact_on_mount, it took a while to get everything compacted but after that the deep scrub of the offending pg went sm

Re: [ceph-users] Jewel (10.2.7) osd suicide timeout while deep-scrub

2017-09-04 Thread Andreas Calminder
Hi! Thanks for the pointer about leveldb_compact_on_mount, it took a while to get everything compacted but after that the deep scrub of the offending pg went smooth without any suicides. I'm considering using the compact on mount feature for all our osd's in the cluster since they're kind of large

Re: [ceph-users] Jewel (10.2.7) osd suicide timeout while deep-scrub

2017-08-17 Thread Gregory Farnum
On Thu, Aug 17, 2017 at 1:02 PM, Andreas Calminder wrote: > Hi! > Thanks for getting back to me! > > Clients access the cluster through rgw (s3), we had some big buckets > containing a lot of small files. Prior to this happening I removed a > semi-stale bucket with a

Re: [ceph-users] Jewel (10.2.7) osd suicide timeout while deep-scrub

2017-08-17 Thread Andreas Calminder
Hi! Thanks for getting back to me! Clients access the cluster through rgw (s3), we had some big buckets containing a lot of small files. Prior to this happening I removed a semi-stale bucket with a rather large index, 2.5 million objects, all but 30 objects didn't actually exist which left the

Re: [ceph-users] Jewel (10.2.7) osd suicide timeout while deep-scrub

2017-08-17 Thread Gregory Farnum
On Thu, Aug 17, 2017 at 12:14 AM Andreas Calminder < andreas.calmin...@klarna.com> wrote: > Thanks, > I've modified the timeout successfully, unfortunately it wasn't enough > for the deep-scrub to finish, so I increased the > osd_op_thread_suicide_timeout even higher (1200s), the deep-scrub >

Re: [ceph-users] Jewel (10.2.7) osd suicide timeout while deep-scrub

2017-08-15 Thread Gregory Farnum
Yes, you can set it on the one one. That configuration is for an entirely internal system and can mismatch across OSDs without trouble. On Tue, Aug 15, 2017 at 4:25 PM Andreas Calminder < andreas.calmin...@klarna.com> wrote: > Thanks, I'll try and do that. Since I'm running a cluster with >

Re: [ceph-users] Jewel (10.2.7) osd suicide timeout while deep-scrub

2017-08-15 Thread Andreas Calminder
Thanks, I'll try and do that. Since I'm running a cluster with multiple nodes, do I have to set this in ceph.conf on all nodes or does it suffice with just the node with that particular osd? On 15 August 2017 at 22:51, Gregory Farnum wrote: > > > On Tue, Aug 15, 2017 at 7:03

Re: [ceph-users] Jewel (10.2.7) osd suicide timeout while deep-scrub

2017-08-15 Thread Gregory Farnum
On Tue, Aug 15, 2017 at 7:03 AM Andreas Calminder < andreas.calmin...@klarna.com> wrote: > Hi, > I got hit with osd suicide timeouts while deep-scrub runs on a > specific pg, there's a RH article > (https://access.redhat.com/solutions/2127471) suggesting changing >

Re: [ceph-users] Jewel (10.2.7) osd suicide timeout while deep-scrub

2017-08-15 Thread Mehmet
I am Not Sure but perhaps nodown/out could help to Finish? - Mehmet Am 15. August 2017 16:01:57 MESZ schrieb Andreas Calminder : >Hi, >I got hit with osd suicide timeouts while deep-scrub runs on a >specific pg, there's a RH article

[ceph-users] Jewel (10.2.7) osd suicide timeout while deep-scrub

2017-08-15 Thread Andreas Calminder
Hi, I got hit with osd suicide timeouts while deep-scrub runs on a specific pg, there's a RH article (https://access.redhat.com/solutions/2127471) suggesting changing osd_scrub_thread_suicide_timeout' from 60s to a higher value, problem is the article is for Hammer and the