[ceph-users] Re: 14.2.20: Strange monitor problem eating 100% CPU

2021-05-04 Thread Janne Johansson
Den tis 4 maj 2021 kl 16:10 skrev Rainer Krienke : > Hello, > I am playing around with a test ceph 14.2.20 cluster. The cluster > consists of 4 VMs, each VM has 2 OSDs. The first three VMs vceph1, > vceph2 and vceph3 are monitors. vceph1 is also mgr. > What I did was quite simple. The cluster is in

[ceph-users] Re: 14.2.20: Strange monitor problem eating 100% CPU

2021-05-04 Thread Dan van der Ster
Hi, This sounds a lot like the negative progress bug we just found last week: https://tracker.ceph.com/issues/50591 That bug makes the mon enter a very long loop rendering a progress bar if the mgr incorrectly sends a message to the mon that the progress is negative. Octopus and later don't have

[ceph-users] Re: 14.2.20: Strange monitor problem eating 100% CPU

2021-05-04 Thread Dan van der Ster
On Tue, May 4, 2021 at 4:21 PM Janne Johansson wrote: > > Den tis 4 maj 2021 kl 16:10 skrev Rainer Krienke : > > Hello, > > I am playing around with a test ceph 14.2.20 cluster. The cluster > > consists of 4 VMs, each VM has 2 OSDs. The first three VMs vceph1, > > vceph2 and vceph3 are monitors. v

[ceph-users] Re: 14.2.20: Strange monitor problem eating 100% CPU

2021-05-04 Thread Janne Johansson
Den tis 4 maj 2021 kl 16:29 skrev Dan van der Ster : > BTW, if you find that this is indeed what's blocking your mons, you > can workaround by setting `ceph progress off` until the fixes are > released. Most ceph commands (and a few of the ceph daemon commands) would just block, so I guess one wou

[ceph-users] Re: 14.2.20: Strange monitor problem eating 100% CPU

2021-05-04 Thread Dan van der Ster
On Tue, May 4, 2021 at 4:34 PM Janne Johansson wrote: > > Den tis 4 maj 2021 kl 16:29 skrev Dan van der Ster : > > BTW, if you find that this is indeed what's blocking your mons, you > > can workaround by setting `ceph progress off` until the fixes are > > released. > > Most ceph commands (and a f

[ceph-users] Re: 14.2.20: Strange monitor problem eating 100% CPU

2021-05-04 Thread Rainer Krienke
Hello Dan, I checked if I see the negative "Updated progress" messages and I actually do. At 07:32:00 I started osd.2 again and then ran some ceph -s until rebalance started and then ceph -s was finally hanging. In the mgr log I see this: https://cloud.uni-koblenz.de/s/CegqBT7pi9nobk4 At th