[ceph-users] Re: 1 OSD laggy: log_latency_fn slow; heartbeat_map is_healthy had timed out after 15

2022-10-16 Thread Dan van der Ster
Hi Michel, Are you sure there isn't a hardware problem with the disk? E.g. maybe you have SCSI timeouts in dmesg or high ioutil with iostat? Anyway I don't think there's a big risk related to draining and stopping the osd. Just consider this a disk failure, which can happen at any time anyway. S

[ceph-users] Re: 1 OSD laggy: log_latency_fn slow; heartbeat_map is_healthy had timed out after 15

2022-10-16 Thread Michel Jouvin
Hi Dan, Thanks for your quick answer. No I check, really nothing in dmesg or /var/log/messages. We'll try to remove it either gracefully or abruptly. Cheers, Michel Le 16/10/2022 à 22:16, Dan van der Ster a écrit : Hi Michel, Are you sure there isn't a hardware problem with the disk? E.g.

[ceph-users] Re: 1 OSD laggy: log_latency_fn slow; heartbeat_map is_healthy had timed out after 15

2022-10-16 Thread Frank Schilder
A disk may be failing without smartctl or other tools showing anything. Does it have remapped sectors? I would just throw the disk out and get a new one. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Michel Jouv

[ceph-users] Re: 1 OSD laggy: log_latency_fn slow; heartbeat_map is_healthy had timed out after 15

2022-10-17 Thread Michel Jouvin
Hi, In fact, a very stupid mistake. This is a CentOS 8 system where smartd was not installed. After installing and starting it, the OSD device is indeed in bad shape with many reported errors, explaining the behaviour observed. We managed to drain gracefully the sick OSD using the approach p