[ceph-users] Re: Discovery (port 8765) service not starting

2024-09-06 Thread Tim Holloway
Thank you, Redouane! Some background. I migrated to Ceph amidst a Perfect Storm. The Ceph docs, as I've often complained, were/are a horrible mish-mash of deprecated instructions and more modern information. So, among other things, I ended up with a mess of resources, some legacy-based, some mana

[ceph-users] Re: Discovery (port 8765) service not starting

2024-09-06 Thread Matthew Vernon
On 06/09/2024 10:27, Matthew Vernon wrote: On 06/09/2024 08:08, Redouane Kachach wrote: That makes sense. The ipv6 BUG can lead to the issue you described. In the current implementation whenever a mgr failover takes place, prometheus configuration (when using the monitoring stack deployed by

[ceph-users] Re: Discovery (port 8765) service not starting

2024-09-06 Thread Matthew Vernon
Hi, On 06/09/2024 08:08, Redouane Kachach wrote: That makes sense. The ipv6 BUG can lead to the issue you described. In the current implementation whenever a mgr failover takes place, prometheus configuration (when using the monitoring stack deployed by Ceph) is updated automatically to point

[ceph-users] Re: Discovery (port 8765) service not starting

2024-09-06 Thread Redouane Kachach
Hi Matthew, That makes sense. The ipv6 BUG can lead to the issue you described. In the current implementation whenever a mgr failover takes place, prometheus configuration (when using the monitoring stack deployed by Ceph) is updated automatically to point to the new active mgr. Unfortunately it's

[ceph-users] Re: Discovery (port 8765) service not starting

2024-09-05 Thread Tim Holloway
Now you've got me worried. As I said, there is absolutely no traffic using port 8765 on my LAN. Am I missing a service? Since my distro is based on stock Prometheus, I'd have to assume that the port 8765 server would be part of the Ceph generic container image and isn't being switched on for some

[ceph-users] Re: Discovery (port 8765) service not starting

2024-09-05 Thread Matthew Vernon
On 05/09/2024 15:03, Matthew Vernon wrote: Hi, On 05/09/2024 12:49, Redouane Kachach wrote: The port 8765 is the "service discovery" (an internal server that runs in the mgr... you can change the port by changing the variable service_discovery_port of cephadm). Normally it is opened in the act

[ceph-users] Re: Discovery (port 8765) service not starting

2024-09-05 Thread Matthew Vernon
Hi, On 05/09/2024 12:49, Redouane Kachach wrote: The port 8765 is the "service discovery" (an internal server that runs in the mgr... you can change the port by changing the variable service_discovery_port of cephadm). Normally it is opened in the active mgr and the service is used by prometheu

[ceph-users] Re: Discovery (port 8765) service not starting

2024-09-05 Thread Redouane Kachach
Hi, The port 8765 is the "service discovery" (an internal server that runs in the mgr... you can change the port by changing the variable service_discovery_port of cephadm). Normally it is opened in the active mgr and the service is used by prometheus (server) to get the targets by using the http

[ceph-users] Re: Discovery (port 8765) service not starting

2024-09-05 Thread Eugen Block
Hi, regarding the scraping endpoints, I wonder if it would make sense to handle it the same way as with the dashboard redirect: ceph config get mgr mgr/dashboard/standby_behaviour redirect If you try to access the dashboard via one of the standby MGRs, you're redirected to the active one.

[ceph-users] Re: Discovery (port 8765) service not starting

2024-09-04 Thread Tim Holloway
I've been monitoring my Ceph LAN segment for the last several hours and absolutely no traffic has shown up on any server for port 8765. Furthermore I did a quick review of Prometheus itself and it's only claiming those 9000-series ports I mentioned previously. So I conclude that this isn't litera

[ceph-users] Re: Discovery (port 8765) service not starting

2024-09-04 Thread Matthew Vernon
Hi, I tracked it down to 2 issues: * our ipv6-only deployment (a bug fixed in 18.2.4, though that has buggy .debs) * Discovery service is only run on the active mgr The latter point is surely a bug? Isn't the point of running a service discovery endpoint that one could point e.g. an externa

[ceph-users] Re: Discovery (port 8765) service not starting

2024-09-03 Thread Tim Holloway
You may find this interesting. I'm running Pacific from the Red Hat repo and Prometheus was given its own discrete container image, not the generic Ceph one. Rather than build custom Prometheus, Red Hat used the Prometheus project's own containers. In fact, it has 3: one for Prometheus, one for P

[ceph-users] Re: Discovery (port 8765) service not starting

2024-09-03 Thread Tim Holloway
Yeah. Although taming the Prometheus logs is on my list, I'm still fuzzy on its details. For your purposes, Docker and Podman can be considered as equivalent. I also run under Podman, incidentally. If the port isn't open inside the container, then blame Prometheus. I'd consider bumping its loggin

[ceph-users] Re: Discovery (port 8765) service not starting

2024-09-03 Thread Matthew Vernon
Hi, On 03/09/2024 14:27, Tim Holloway wrote: FWIW, I'm using podman not docker. The netstat command is not available in the stock Ceph containers, but the "ss" command is, so use that to see if there is in fact a process listening on that port. I have done this, and there's nothing listening

[ceph-users] Re: Discovery (port 8765) service not starting

2024-09-03 Thread Tim Holloway
While I generally don't recommend getting down and dirty with the containers in Ceph, if you're going to build your own, well, that's different. When I have a container and the expected port isn't listening, the first thing I do is see if it's really listening and internal-only or truly not listen

[ceph-users] Re: Discovery (port 8765) service not starting

2024-09-03 Thread Matthew Vernon
On 03/09/2024 13:33, Eugen Block wrote: Oh that's interesting :-D I have no explanation for that, except maybe some flaw in your custom images? Or in the service specs? Not sure, to be honest... So obviously it _could_ be something in our images, but we're using Ceph's published .debs (18.2.2

[ceph-users] Re: Discovery (port 8765) service not starting

2024-09-03 Thread Eugen Block
Oh that's interesting :-D I have no explanation for that, except maybe some flaw in your custom images? Or in the service specs? Not sure, to be honest... Zitat von Matthew Vernon : Hi, On 03/09/2024 11:46, Eugen Block wrote: Do you see the port definition in the unit.meta file? Oddly:

[ceph-users] Re: Discovery (port 8765) service not starting

2024-09-03 Thread Matthew Vernon
Hi, On 03/09/2024 11:46, Eugen Block wrote: Do you see the port definition in the unit.meta file? Oddly: "ports": [ 9283, 8765, 8765, 8765, 8765 ], which doesn't look right... Regards, Mattew ___ ce

[ceph-users] Re: Discovery (port 8765) service not starting

2024-09-03 Thread Eugen Block
Do you see the port definition in the unit.meta file? jq -r '.ports' /var/lib/ceph/{FSID}/mgr.{MGR}/unit.meta [ 8443, 9283, 8765 ] Zitat von Matthew Vernon : Hi, On 02/09/2024 21:24, Eugen Block wrote: Without having looked too closely, do you run ceph with IPv6? There’s a tracker is

[ceph-users] Re: Discovery (port 8765) service not starting

2024-09-03 Thread Matthew Vernon
Hi, On 02/09/2024 21:24, Eugen Block wrote: Without having looked too closely, do you run ceph with IPv6? There’s a tracker issue: https://tracker.ceph.com/issues/66426 It will be backported to Reef. I do run IPv6, but the problem is that nothing is listening on port 8765 at all, not that

[ceph-users] Re: Discovery (port 8765) service not starting

2024-09-02 Thread Eugen Block
Without having looked too closely, do you run ceph with IPv6? There’s a tracker issue: https://tracker.ceph.com/issues/66426 It will be backported to Reef. Zitat von Matthew Vernon : Hi, I'm running reef, with locally-built containers based on upstream .debs. I've now enabled prometheus m