Bug#336230: NTPD not working after Debian upgrade (V3.0 -> V3.1)

Eberhard Spittler Fri, 28 Oct 2005 12:19:28 -0700

Package: ntp-server
Version: 1:4.2.0a+stable-2sarge1
architecture: i386

activy:~# uname -a
Linux activy 2.4.27-2-686 #1 Mon May 16 17:03:22 JST 2005 i686 GNU/Linux

activy:~# ls -l /lib/libc.so.6
lrwxrwxrwx  1 root root 13 Oct 10 07:24 /lib/libc.so.6 -> libc-2.3.2.so


Installed ntp packages: ntp, ntp-doc, ntp-server, ntp-simple, ntpdate.

PROBLEM: after system upgrade, the ntpd starts in 2 instances a boot
time, but both die within 1-2 minutes.

From "ps fax":

  907 ?        SLs    0:00 ntpd
912 ? S 0:00 \_ ntpd


The process numbers in the following log extract are not matching, as
both snapshots habe been taken on different occasions ...

Log example:

19 Oct 06:32:42 ntpd[425]: frequency initialized 79.130 PPM from 
/var/lib/ntp/ntp.drift
19 Oct 06:32:52 ntpd[458]: signal_no_reset: signal 17 had flags 4000000
19 Oct 06:32:54 ntpd[458]: signal_no_reset: signal 14 had flags 4000000
19 Oct 06:33:24 ntpd[458]: parent died before we finished, exiting
20 Oct 06:36:20 ntpd[425]: frequency initialized 79.130 PPM from 
/var/lib/ntp/ntp.drift
20 Oct 06:36:33 ntpd[458]: signal_no_reset: signal 17 had flags 4000000
20 Oct 06:36:35 ntpd[458]: signal_no_reset: signal 14 had flags 4000000
20 Oct 06:37:05 ntpd[458]: parent died before we finished, exiting


From deamon log file:

Oct 20 19:40:54 activy ntpd[907]: ntpd [EMAIL PROTECTED]:4.2.0a+stable-2-r Fri 
Aug 26 10:30:12 UTC 2005 (1)
Oct 20 19:40:54 activy ntpd[907]: signal_no_reset: signal 13 had flags 4000000
Oct 20 19:40:54 activy ntpd[907]: precision = 2.000 usec
Oct 20 19:40:54 activy ntpd[907]: Listening on interface wildcard, 0.0.0.0#123
Oct 20 19:40:54 activy ntpd[907]: Listening on interface lo, 127.0.0.1#123
Oct 20 19:40:54 activy ntpd[907]: Listening on interface eth0, 
192.168.192.77#123
Oct 20 19:40:54 activy ntpd[907]: kernel time sync status 0040


Applying "strace -f" on the startup script delivers (tail of output
only, as it was very, very long):

[pid   573] --- SIGALRM (Alarm clock) @ 0 (0) ---
[pid   573] sigreturn()                 = ? (mask now [RTMIN])
[pid   573] gettimeofday({1130269798, 250610}, NULL) = 0
[pid   573] gettimeofday({1130269798, 251333}, NULL) = 0
[pid   573] gettimeofday({1130269798, 252103}, NULL) = 0
[pid   573] gettimeofday({1130269798, 253088}, NULL) = 0
[pid   573] time(NULL)                  = 1130269798
[pid   573] write(7, "53668 71398.253 127.127.1.0 9014"..., 81) = 81
[pid   573] select(7, [4 5 6], NULL, NULL, NULL) = ? ERESTARTNOHAND (To be 
restarted)
[pid   573] --- SIGALRM (Alarm clock) @ 0 (0) ---
[pid   573] sigreturn()                 = ? (mask now [RTMIN])
[pid   573] select(7, [4 5 6], NULL, NULL, NULL) = ? ERESTARTNOHAND (To be 
restarted)
[pid   573] --- SIGALRM (Alarm clock) @ 0 (0) ---
[pid   573] sigreturn()                 = ? (mask now [RTMIN])
[pid   573] gettimeofday({1130269800, 250588}, NULL) = 0
[pid   573] sendto(6, "\343\0\6\366\0\0\0\0\0\0\0\6INIT\0\0\0\0\0\0\0\0\0\0\0"..., 48, 0, 
{sa_family=AF_INET, sin_port=htons(123), sin_addr=inet_addr("161.53.30.3")}, 16) = 48
[pid   573] select(7, [4 5 6], NULL, NULL, NULL) = 1 (in [6])
[pid   573] gettimeofday({1130269800, 322973}, NULL) = 0
[pid   573] select(7, [4 5 6], NULL, NULL, {0, 0}) = 1 (in [6], left {0, 0})
[pid   573] recvfrom(6, "$\2\6\353\0\0\1\222\0\0\v:\2415\1\2\307\t\10\252\231Y\255"..., 
1092, 0, {sa_family=AF_INET, sin_port=htons(123), sin_addr=inet_addr("161.53.30.3")}, 
[16]) = 48
[pid   573] select(7, [6], NULL, NULL, {0, 0}) = 0 (Timeout)
[pid   573] gettimeofday({1130269800, 326593}, NULL) = 0
[pid   573] gettimeofday({1130269800, 327440}, NULL) = 0
[pid   573] gettimeofday({1130269800, 328233}, NULL) = 0
[pid   573] time(NULL)                  = 1130269800
[pid   573] write(7, "53668 71400.328 161.53.30.3 9014"..., 82) = 82
[pid   573] select(7, [4 5 6], NULL, NULL, NULL) = ? ERESTARTNOHAND (To be 
restarted)
[pid   573] --- SIGALRM (Alarm clock) @ 0 (0) ---
[pid   573] sigreturn()                 = ? (mask now [RTMIN])
[pid   573] gettimeofday({1130269801, 249125}, NULL) = 0
[pid   573] --- SIGSEGV (Segmentation fault) @ 0 (0) ---
Process 573 detached
--- SIGALRM (Alarm clock) @ 0 (0) ---
<... rt_sigsuspend resumed> )           = -1 EINTR (Interrupted system call)
alarm(30)                               = 0
sigreturn()                             = ? (mask now [RTMIN])
getppid()                               = 1
time(NULL)                              = 1130269825
getpid()                                = 574
write(8, "25 Oct 21:50:25 ntpd[574]: paren"..., 67) = 67
munmap(0x40019000, 4096)                = 0
exit_group(0)                           = ?

Process 574 detached


The first process [573] suffers from a segmentation fault, causing the
second [574] to die also. Again process numbers do not match the
preceeding examples.

   + + +   + + +   + + +   + + +   + + +   + + +   + + +   + + +   + + +

During system upgrade I had the old configuration files requested to
remain in effect. Now I suspected that they might not match the new
version and attempted the minimal configuration file which aptitude had
written as a backup file - and it worked !!!

Next steps were to isolate the erraneous config commands - but not a
single command line for itself seems to be wrong. At first I suspected
'special commands' like peer, restrict, broadcast - but none of them failed.
   So the (long) list of server commands got under suspicion: I
expected the one of the servers might send malicious data in order to
kill clients. Again, not a single one could be proven as guilty.

   + + +   + + +   + + +   + + +   + + +   + + +   + + +   + + +   + + +

It seems to be a QUANTITY PROBLEM, according to my tests up to now.

The following is my current /etc/ntp.conf in the state it is working:

activy:~# cat /etc/ntp.conf
# /etc/ntp.conf, configuration for ntpd

# ntpd will use syslog() if logfile is not defined
logfile /var/log/ntpd

driftfile /var/lib/ntp/ntp.drift
statsdir /var/log/ntpstats/

statistics loopstats peerstats clockstats
filegen loopstats file loopstats type day enable
filegen peerstats file peerstats type day enable
filegen clockstats file clockstats type day enable


#       Zeitserverliste:

##      Stratum-11 (lokal)
peer   192.168.192.9            # P100 gleichwertig
server 192.168.192.34
server 192.168.192.88
server 192.168.192.7

##      Stratum-2-Server:
#server st.ntp.carnet.hr        # HR
#server time.ijs.si             # SI
server  biofiz.mf.uni-lj.si     # SI
server  ntp2.tuxfamily.net      # FR
#server ntp.univ-lyon1.fr       # FR
#server ntp1.pucpr.br           # BR
#server ntp2.contactel.cz       # CZ
server  ntp.karpo.cz            # CZ
#server ntp.doubleukay.com      # MY
#server fartein.ifi.uio.no      # NO
server  tock.keso.fi            # FI
#server sign.chg.ru             # RU
#server ntp.psn.ru              # RU
#server clock.cimat.ues.edu.sv  # SV
#server ntp.saard.net           # AU
#server timelord.uregina.ca     # CA
#server ntp3.cs.wisc.edu        # US
#server tock.nml.csir.co.za     # ZA
server  ntp4.uni-augsburg.de    # DE

##      Stratum-1-Server:
# server        ntps1-2.uni-erlangen.de
# server        ntp2.fau.de     # Uni Erlangen
# server        ntp3.fau.de
# server        ntp2.ptb.de     # PTB Braunschweig
# server        ntp1.ptb.de
# server        tick.usno.navy.mil

# pool.ntp.org maps to more than 100 low-stratum NTP servers.
# Your server will pick a different set every time it starts up.
#  *** Please consider joining the pool! ***
#  ***  <http://www.pool.ntp.org/#join>  ***
server          pool.ntp.org
#server pool.ntp.org
## uncomment for extra reliability

# ... and use the local system clock as a reference if all else fails
# NOTE: in a local network, set the local stratum of *one* stable server
# to 10; otherwise your clocks will drift apart if you lose connectivity.
server 127.127.1.0              # local clock (LCL)
fudge  127.127.1.0 stratum 13   # LCL is unsynchronized


##      Zugriffsrechte:

# By default, exchange time with everybody, but don't allow configuration.
# See /usr/share/doc/ntp-doc/html/accopt.html for details.
restrict default kod notrap nomodify nopeer noquery

# Local users may interrogate the ntp server more closely.
restrict 127.0.0.1 nomodify

# Clients from this subnet have unlimited access,
# but only if cryptographically authenticated
#restrict 192.168.192.0  mask  255.255.255.0 notrust
# LAN-Rechner werden unverschlüsselt bedient, dürfen aber nicht ändern:
restrict 192.168.192.0  mask  255.255.255.0 kod notrap nomodify
# P450 darf alles:
restrict 192.168.192.7  mask  255.255.255.255


##      Broadcast:

# If you want to provide time to your local subnet, change the next line.
broadcast       192.168.192.255 # fuer LAN

# If you want to listen to time broadcasts on your local subnet,
# de-comment the next lines. Please do this only if you trust everybody
# on the network!
#disable auth
#broadcastclient


The "ps fax" contains the following 2 lines for ntpd:

  428 ?        SLs    0:00 /usr/sbin/ntpd -p /var/run/ntpd.pid
  453 ?        S      0:00  \_ /usr/sbin/ntpd -p /var/run/ntpd.pid


   + + +   + + +   + + +   + + +   + + +   + + +   + + +   + + +   + + +

If I only add 1 additional server (by deleting the comment # at the
beginning of the line in the config file), the daemon crashes soon after
having been started - no matter whether started by boot or manually.

I have no explanation for this behaviour. Is there a new limit for the
number of servers, which I eventually overlooked in the documentation,
or is it a real bug?

There is a bug report underhttp://bugs.debian.org/cgi-bin/bugreport.cgi?bug=316242 showing the samesymptom. But the explanation does not fit to my case as I do not have asingle IPV6-address.


--

Regards,

 -----------------
 Eberhard Spittler
 [ http://spittler.name/ ]

Bug#336230: NTPD not working after Debian upgrade (V3.0 -> V3.1)

Reply via email to