I have had defer_client_eviction_on_laggy_osds set to false for a while
and I haven't had any further warnings so far (obviously), but also all
the other problems with laggy clients bringing our MDS to a crawl over
time seem to have gone. So at least on our cluster, the new configurable
seems
Hi,
I took a snapshot of MDS.0's logs. We have five active MDS in total,
each one reporting laggy OSDs/clients, but I cannot find anything
related to that in the log snippet. Anyhow, I uploaded the log for your
reference with ceph-post-file ID 79b5138b-61d7-4ba7-b0a9-c6f02f47b881.
This is
Hi Janek,
The PR venky mentioned makes use of OSD's laggy parameters (laggy_interval
and
laggy_probability) to find if any OSD is laggy or not. These laggy
parameters
can reset to 0 if the interval between the last modification done to OSDMap
and
the time stamp when OSD was marked down exceeds
Hey Janek,
I took a closer look at various places where the MDS would consider a
client as laggy and it seems like a wide variety of reasons are taken
into consideration and not all of them might be a reason to defer client
eviction, so the warning is a bit misleading. I'll post a PR for this. In
Hi Janek,
On Tue, Sep 19, 2023 at 4:44 PM Janek Bevendorff <
janek.bevendo...@uni-weimar.de> wrote:
> Hi Venky,
>
> As I said: There are no laggy OSDs. The maximum ping I have for any OSD in
> ceph osd perf is around 60ms (just a handful, probably aging disks). The
> vast majority of OSDs have
Hi Venky,
As I said: There are no laggy OSDs. The maximum ping I have for any OSD
in ceph osd perf is around 60ms (just a handful, probably aging disks).
The vast majority of OSDs have ping times of less than 1ms. Same for the
host machines, yet I'm still seeing this message. It seems that
Hi Janek,
On Mon, Sep 18, 2023 at 9:52 PM Janek Bevendorff <
janek.bevendo...@uni-weimar.de> wrote:
> Thanks! However, I still don't really understand why I am seeing this.
>
This is due to a changes that was merged recently in pacific
https://github.com/ceph/ceph/pull/52270
The MDS
Thanks! However, I still don't really understand why I am seeing this.
The first time I had this, one of the clients was a remote user dialling
in via VPN, which could indeed be laggy. But I am also seeing it from
neighbouring hosts that are on the same physical network with reliable
ping
Hi Janek,
There was some documentation added about it here:
https://docs.ceph.com/en/pacific/cephfs/health-messages/
There is a description of what it means, and it's tied to an mds
configurable.
On Mon, Sep 18, 2023 at 10:51 AM Janek Bevendorff <
janek.bevendo...@uni-weimar.de> wrote:
> Hey