[ceph-users] Re: Nautilus 14.2.19 mon 100% CPU

2021-04-29 Thread Dan van der Ster
On Sat, Apr 10, 2021 at 2:10 AM Robert LeBlanc wrote: > > On Fri, Apr 9, 2021 at 4:04 PM Dan van der Ster wrote: > > > > Here's what you should look for, with debug_mon=10. It shows clearly > > that it takes the mon 23 seconds to run through > > get_removed_snaps_range. > > So if this is

[ceph-users] Re: Nautilus 14.2.19 mon 100% CPU

2021-04-12 Thread Brad Hubbard
On Tue, Apr 13, 2021 at 8:40 AM Robert LeBlanc wrote: > > Do you think it would be possible to build Nautilus FUSE or newer on > 14.04, or do you think the toolchain has evolved too much since then? > An interesting question. # cat /etc/os-release NAME="Ubuntu" VERSION="14.04.6 LTS, Trusty

[ceph-users] Re: Nautilus 14.2.19 mon 100% CPU

2021-04-12 Thread Robert LeBlanc
On Mon, Apr 12, 2021 at 3:41 PM Brad Hubbard wrote: > > Sure Robert, > > I understand the realities of maintaining large installations which > may have many reasons holding them back from upgrading any of the > interdependent software they run. The other side of the page however > is that we can

[ceph-users] Re: Nautilus 14.2.19 mon 100% CPU

2021-04-12 Thread Brad Hubbard
On Mon, Apr 12, 2021 at 11:35 AM Robert LeBlanc wrote: > > On Sun, Apr 11, 2021 at 4:19 PM Brad Hubbard wrote: > > > > PSA. > > > > https://docs.ceph.com/en/latest/releases/general/#lifetime-of-stable-releases > > > > https://docs.ceph.com/en/latest/releases/#ceph-releases-index > > I'm very

[ceph-users] Re: Nautilus 14.2.19 mon 100% CPU

2021-04-11 Thread Robert LeBlanc
On Sun, Apr 11, 2021 at 4:19 PM Brad Hubbard wrote: > > PSA. > > https://docs.ceph.com/en/latest/releases/general/#lifetime-of-stable-releases > > https://docs.ceph.com/en/latest/releases/#ceph-releases-index I'm very well aware that we are living on the dying edge (well, past dead), but a good

[ceph-users] Re: Nautilus 14.2.19 mon 100% CPU

2021-04-11 Thread Brad Hubbard
PSA. https://docs.ceph.com/en/latest/releases/general/#lifetime-of-stable-releases https://docs.ceph.com/en/latest/releases/#ceph-releases-index On Sat, Apr 10, 2021 at 10:11 AM Robert LeBlanc wrote: > > On Fri, Apr 9, 2021 at 4:04 PM Dan van der Ster wrote: > > > > Here's what you should

[ceph-users] Re: Nautilus 14.2.19 mon 100% CPU

2021-04-09 Thread Robert LeBlanc
On Fri, Apr 9, 2021 at 4:04 PM Dan van der Ster wrote: > > Here's what you should look for, with debug_mon=10. It shows clearly > that it takes the mon 23 seconds to run through > get_removed_snaps_range. > So if this is happening every 30s, it explains at least part of why > this mon is busy. >

[ceph-users] Re: Nautilus 14.2.19 mon 100% CPU

2021-04-09 Thread Dan van der Ster
On Fri, Apr 9, 2021 at 11:50 PM Robert LeBlanc wrote: > > On Fri, Apr 9, 2021 at 2:04 PM Dan van der Ster wrote: > > > > On Fri, Apr 9, 2021 at 9:37 PM Dan van der Ster wrote: > > > > > > On Fri, Apr 9, 2021 at 8:39 PM Robert LeBlanc > > > wrote: > > > > > > > > On Fri, Apr 9, 2021 at 11:49

[ceph-users] Re: Nautilus 14.2.19 mon 100% CPU

2021-04-09 Thread Robert LeBlanc
On Fri, Apr 9, 2021 at 2:04 PM Dan van der Ster wrote: > > On Fri, Apr 9, 2021 at 9:37 PM Dan van der Ster wrote: > > > > On Fri, Apr 9, 2021 at 8:39 PM Robert LeBlanc wrote: > > > > > > On Fri, Apr 9, 2021 at 11:49 AM Dan van der Ster > > > wrote: > > > > > > > > Thanks. I didn't see

[ceph-users] Re: Nautilus 14.2.19 mon 100% CPU

2021-04-09 Thread Dan van der Ster
On Fri, Apr 9, 2021 at 9:37 PM Dan van der Ster wrote: > > On Fri, Apr 9, 2021 at 8:39 PM Robert LeBlanc wrote: > > > > On Fri, Apr 9, 2021 at 11:49 AM Dan van der Ster > > wrote: > > > > > > Thanks. I didn't see anything ultra obvious to me. > > > > > > But I did notice the nearfull warnings

[ceph-users] Re: Nautilus 14.2.19 mon 100% CPU

2021-04-09 Thread Dan van der Ster
On Fri, Apr 9, 2021 at 8:39 PM Robert LeBlanc wrote: > > On Fri, Apr 9, 2021 at 11:49 AM Dan van der Ster wrote: > > > > Thanks. I didn't see anything ultra obvious to me. > > > > But I did notice the nearfull warnings so I wonder if this cluster is > > churning through osdmaps? Did you see a

[ceph-users] Re: Nautilus 14.2.19 mon 100% CPU

2021-04-09 Thread Stefan Kooman
On 4/9/21 3:40 PM, Robert LeBlanc wrote: I'm attempting to deep scrub all the PGs to see if that helps clear up some accounting issues, but that's going to take a really long time on 2PB of data. Are you running with 1 mon now? Have you tried adding mons from scratch? So with a fresh

[ceph-users] Re: Nautilus 14.2.19 mon 100% CPU

2021-04-09 Thread Robert LeBlanc
On Fri, Apr 9, 2021 at 11:49 AM Dan van der Ster wrote: > > Thanks. I didn't see anything ultra obvious to me. > > But I did notice the nearfull warnings so I wonder if this cluster is > churning through osdmaps? Did you see a large increase in inbound or > outbound network traffic on this mon

[ceph-users] Re: Nautilus 14.2.19 mon 100% CPU

2021-04-09 Thread Dan van der Ster
On Fri, Apr 9, 2021 at 7:24 PM Robert LeBlanc wrote: > > On Fri, Apr 9, 2021 at 11:05 AM Dan van der Ster wrote: > > > > Hi Robert, > > > > Have you checked a log with debug_mon=20 yet to try to see what it's doing? > > > I've posted the logs with debug_mon=20 for a period during high CPU > here

[ceph-users] Re: Nautilus 14.2.19 mon 100% CPU

2021-04-09 Thread Robert LeBlanc
On Fri, Apr 9, 2021 at 11:05 AM Dan van der Ster wrote: > > Hi Robert, > > Have you checked a log with debug_mon=20 yet to try to see what it's doing? > I've posted the logs with debug_mon=20 for a period during high CPU here https://owncloud.leblancnet.us/owncloud/index.php/s/OtHsBAYN9r5eSbU

[ceph-users] Re: Nautilus 14.2.19 mon 100% CPU

2021-04-09 Thread Dan van der Ster
Hi Robert, Have you checked a log with debug_mon=20 yet to try to see what it's doing? .. Dan On Fri, Apr 9, 2021, 7:02 PM Robert LeBlanc wrote: > The only step not yet taken was to move to straw2. That was the last > step we were going to do next. > > Robert LeBlanc > PGP

[ceph-users] Re: Nautilus 14.2.19 mon 100% CPU

2021-04-09 Thread Robert LeBlanc
The only step not yet taken was to move to straw2. That was the last step we were going to do next. Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 On Fri, Apr 9, 2021 at 10:41 AM Robert LeBlanc wrote: > > On Fri, Apr 9, 2021 at 9:25 AM Stefan

[ceph-users] Re: Nautilus 14.2.19 mon 100% CPU

2021-04-09 Thread Robert LeBlanc
On Fri, Apr 9, 2021 at 9:25 AM Stefan Kooman wrote: > Are you running with 1 mon now? Have you tried adding mons from scratch? > So with a fresh database? And then maybe after they have joined, kill > the donor mon and start from scratch. > > You have for sure not missed a step during the upgrade

[ceph-users] Re: Nautilus 14.2.19 mon 100% CPU

2021-04-09 Thread Robert LeBlanc
I'm attempting to deep scrub all the PGs to see if that helps clear up some accounting issues, but that's going to take a really long time on 2PB of data. Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 On Thu, Apr 8, 2021 at 9:48 PM Robert

[ceph-users] Re: Nautilus 14.2.19 mon 100% CPU

2021-04-08 Thread Robert LeBlanc
Good thought. The storage for the monitor data is a RAID-0 over three NVMe devices. Watching iostat, they are completely idle, maybe 0.8% to 1.4% for a second every minute or so. Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 On Thu, Apr 8, 2021

[ceph-users] Re: Nautilus 14.2.19 mon 100% CPU

2021-04-08 Thread Stefan Kooman
On 4/8/21 6:22 PM, Robert LeBlanc wrote: I upgraded our Luminous cluster to Nautilus a couple of weeks ago and converted the last batch of FileStore OSDs to BlueStore about 36 hours ago. Yesterday our monitor cluster went nuts and started constantly calling elections because monitor nodes were

[ceph-users] Re: Nautilus 14.2.19 mon 100% CPU

2021-04-08 Thread Robert LeBlanc
I found this thread that matches a lot of what I'm seeing. I see the ms_dispatch thread going to 100%, but I'm at a single MON, the recovery is done and the rocksdb MON database is ~300MB. I've tried all the settings mentioned in that thread with no noticeable improvement. I was hoping that once

[ceph-users] Re: Nautilus 14.2.19 mon 100% CPU

2021-04-08 Thread Robert LeBlanc
On Thu, Apr 8, 2021 at 11:24 AM Robert LeBlanc wrote: > > On Thu, Apr 8, 2021 at 10:22 AM Robert LeBlanc wrote: > > > > I upgraded our Luminous cluster to Nautilus a couple of weeks ago and > > converted the last batch of FileStore OSDs to BlueStore about 36 hours ago. > > Yesterday our

[ceph-users] Re: Nautilus 14.2.19 mon 100% CPU

2021-04-08 Thread Robert LeBlanc
On Thu, Apr 8, 2021 at 10:22 AM Robert LeBlanc wrote: > > I upgraded our Luminous cluster to Nautilus a couple of weeks ago and > converted the last batch of FileStore OSDs to BlueStore about 36 hours ago. > Yesterday our monitor cluster went nuts and started constantly calling > elections