I investigated this quite a bit, and this appears to be an ntp bug and not a charm bug.
This host is a trusty host, running ntp version 1:4.2.6.p5+dfsg-3ubuntu2.14.04.13. We have other hosts running the same version that don't have the problem described above. I spent quite some time investigating this, comparing the hosts, running strace etc, and I noticed a subtle difference in /etc/hosts : on the working host, the ::1 entry doesn't have "localhost", but it does on the failing host. When I removed "localhost" from the ::1 entry on the failing host, "ntpq -pn" started working. Investigating things a bit more, I found out that on the working host, ntpd was listening on ::1 but on the failing host, it wasn't (by checking "ss -anupe" output as well as ntpd starting logs). Comparing straces of starting ntpd, I think I was able to find what's going on. On the working host it gives (only relevant output is posted here) : 3973 19:41:32 socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 5 [...] 3973 19:41:32 ioctl(5, SIOCGIFFLAGS, {ifr_name="qvobb268af4-e9", ifr_flags=IFF_UP|IFF_BROADCAST|IFF_RUNNING|IFF_PROMISC|IFF_MULTICAST}) = 0 3973 19:41:32 ioctl(5, SIOCGIFFLAGS, {ifr_name="qbrd5588b49-e3", ifr_flags=IFF_UP|IFF_BROADCAST|IFF_RUNNING|IFF_MULTICAST}) = 0 3973 19:41:32 ioctl(5, SIOCGIFFLAGS, {ifr_name="qvb1693c156-5f", ifr_flags=IFF_UP|IFF_BROADCAST|IFF_RUNNING|IFF_PROMISC|IFF_MULTICAST}) = 0 [... the same for a bunch of interfaces - this is a nova compute node so this is expected ...] 3973 19:41:32 close(5) = 0 But on the failing host, it checks a single interface : 56717 19:37:03 socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 5 [...] 56717 19:37:03 ioctl(5, SIOCGIFFLAGS, {ifr_name="qvbba244f00-69", ifr_flags=IFF_UP|IFF_BROADCAST|IFF_RUNNING|IFF_PROMISC|IFF_MULTICAST}) = 0 56717 19:37:03 close(5) = 0 So I thought this interface was a bit special : $ ip li sh dev qvbba244f00-69 67772: qvbba244f00-69@qvoba244f00-69: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc noqueue master qbrba244f00-69 state UP mode DEFAULT group default qlen 1000 link/ether 0e:ac:86:b1:c8:24 brd ff:ff:ff:ff:ff:ff It appears completely normal, except that it has an unusually high ifindex (67772). Could that be the cause of the problem ? Looking at the source code at https://git.launchpad.net/ubuntu/+source/ntp/tree/?h=ubuntu/trusty- updates : interfaces are parsed looking at the /proc/net/if_inet6 file (https://git.launchpad.net/ubuntu/+source/ntp/tree/lib/isc/unix/ifiter_getifaddrs.c?h=ubuntu/trusty- updates#n54) which strace confirms : 3973 19:41:32 open("/proc/net/if_inet6", O_RDONLY) = 6 Each line is parsed using fgets : fgets(iter->entry, sizeof(iter->entry), iter->proc) != NULL) https://git.launchpad.net/ubuntu/+source/ntp/tree/lib/isc/unix/interfaceiter.c?h=ubuntu/trusty- updates#n181 What's sizeof(iter->entry) ? Well "entry" is defined like that : char entry[ISC_IF_INET6_SZ]; https://git.launchpad.net/ubuntu/+source/ntp/tree/lib/isc/unix/ifiter_getifaddrs.c?h=ubuntu/trusty- updates#n48 And ISC_IF_INET6_SZ is : #define ISC_IF_INET6_SZ \ sizeof("00000000000000000000000000000001 01 80 10 80 XXXXXXloXXXXXXXX\n") https://git.launchpad.net/ubuntu/+source/ntp/tree/lib/isc/unix/interfaceiter.c?h=ubuntu/trusty- updates#n153 And this is where the problem is. The computation of ISC_IF_INET6_SZ assumes that ifindex will be 2 chars (in hex), so that ifindex will be < 256. However, ifindexes higher than that are likely common, so why don't we see this bug elsewhere ? Well because the computation of ISC_IF_INET6_SZ also assumes that the interface name is 16 chars. In our example, the interface name is "only" 14 chars, so we have a buffer of 2 chars for the ifindex. But that's not enough, it's off by 1 in fact ! "00000000000000000000000000000001 01 80 10 80 XXXXXXloXXXXXXXX\n" is 62 chars long. The first line of if_inet6 on our machine is : fe800000000000000cac86fffeb1c824 108bc 40 20 80 qvbba244f00-69, and that's 62 chars long... but without the \n ! So what might be happening here is that the first iteration of the loop will properly read the whole line except the \n, and the next iteration will resume at that location, and because fgets() stops at EOF or newline, it will just return a newline, which will make the whole iteration stop. The fix here is pretty simple : the computation of ISC_IF_INET6_SZ should assume an ifindex of UINT_MAX, ie ffffffff (or any 8-chars number). If I can trust https://git.launchpad.net/ubuntu/+source/ntp/tree/lib/isc/unix/interfaceiter.c?h=applied/ubuntu/jammy this is still present in Jammy. Redirecting the bug to the "ntp" package. ** Also affects: ntp (Ubuntu) Importance: Undecided Status: New ** Changed in: ntp-charm Status: New => Invalid -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to ntp in Ubuntu. https://bugs.launchpad.net/bugs/1952264 Title: ntp sync checks fail when server as no IPv6 connectivity Status in NTP Charm: Invalid Status in ntp package in Ubuntu: New Bug description: This charm sets up ntpmon and nagios checks to alert when ntp was not able to select a sync peer. On a server without a routable ipv6 configured, ntpq -p fails with: $ ntpq -p localhost: timed out, nothing received ***Request timed out $ /opt/ntpmon-ntp-charm/check_ntpmon.py --check sync CRITICAL: No sync peer selected | frequency= offset=nan peers=0 reach=nan result=2 rootdelay= rootdisp= runtime= stratum= sync=0.000000 sysjitter= sysoffset= tracehosts= traceloops= tracetime= This results in a nagios alert complaining about the problem. Although: $ ntpq -p -4 remote refid st t when poll reach delay offset jitter ============================================================================== *hostname1 xxx.xxx.xxx.x 2 u 210 256 377 0.842 0.031 0.050 +hostname2 xxx.xxx.xxx.x 2 u 88 256 377 0.327 0.062 0.107 -hostname3 xxx.xxx.xxx.x 2 u 210 256 377 75.810 -1.198 1.035 +hostname4 xxx.xxx.xxx.x 2 u 68 256 377 0.751 0.078 0.193 $ ntpq -p -4 | /opt/ntpmon-ntp-charm/check_ntpmon.py --check sync --test OK: Time is in sync with hostname1 | frequency= offset=0.000057 peers=4 reach=100.000000 result=0 rootdelay= rootdisp= runtime= stratum= sync=1.000000 sysjitter= sysoffset= tracehosts= traceloops= tracetime= Maybe this is a bug to file against ntp itself ? Or some configuration could allow ntpq -p and check_ntpmon.py to succeed ? I've tested running ntpd with -4 (using defaults file) but with no luck. Let us know if you need more information. Thank you, Loïc To manage notifications about this bug go to: https://bugs.launchpad.net/ntp-charm/+bug/1952264/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp