[ceph-users] Re: 1 OSD laggy: log_latency_fn slow; heartbeat_map is_healthy had timed out after 15

2022-10-16 Thread Frank Schilder
A disk may be failing without smartctl or other tools showing anything. Does it have remapped sectors? I would just throw the disk out and get a new one. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Michel Jouv

[ceph-users] Re: 1 OSD laggy: log_latency_fn slow; heartbeat_map is_healthy had timed out after 15

2022-10-16 Thread Michel Jouvin
Hi Dan, Thanks for your quick answer. No I check, really nothing in dmesg or /var/log/messages. We'll try to remove it either gracefully or abruptly. Cheers, Michel Le 16/10/2022 à 22:16, Dan van der Ster a écrit : Hi Michel, Are you sure there isn't a hardware problem with the disk? E.g.

[ceph-users] Re: 1 OSD laggy: log_latency_fn slow; heartbeat_map is_healthy had timed out after 15

2022-10-16 Thread Dan van der Ster
Hi Michel, Are you sure there isn't a hardware problem with the disk? E.g. maybe you have SCSI timeouts in dmesg or high ioutil with iostat? Anyway I don't think there's a big risk related to draining and stopping the osd. Just consider this a disk failure, which can happen at any time anyway. S

[ceph-users] 1 OSD laggy: log_latency_fn slow; heartbeat_map is_healthy had timed out after 15

2022-10-16 Thread Michel Jouvin
Hi, We have a production cluster made of 12 OSD servers with 16 OSD each (all the same HW) which has been running fine for 5 years (initially installed with Luminous) and which has been running Octopus (15.2.16) for 1 year and was recently upgraded to 15.2.17 (1 week before the problem starte

[ceph-users] Re: pool size ...

2022-10-16 Thread Janne Johansson
> Hi, > I've seen Dan's talk: > https://www.youtube.com/watch?v=0i7ew3XXb7Q > and other similar ones that talk about CLUSTER size. > But, I see nothing (perhaps I have not looked hard enough), on any > recommendations regarding max POOL size. > So, are there any limitations on a given pool that ha

[ceph-users] Spam on /var/log/messages due to config leftover?

2022-10-16 Thread Nicola Mori
Dear Ceph users, on one of my nodes I see that the /var/log/messages is being spammed by these messages: Oct 16 12:51:11 bofur bash[2473311]: :::172.16.253.2 - - [16/Oct/2022:10:51:11] "GET /metrics HTTP/1.1" 200 - "" "Prometheus/2.33.4" Oct 16 12:51:12 bofur bash[2487821]: ts=2022-10-16T

[ceph-users] Re: pool size ...

2022-10-16 Thread Eugen Block
Hi, for a replicated pool there's a hard-coded limit of 10: $ ceph osd pool set test-pool size 20 Error EINVAL: pool size must be between 1 and 10 And it seems reasonable to limit a replicated pool, so many replicas increase the cost and network traffic without having too much of a benefit.