** Description changed:

- We're seeing a race between if-up.d/ntpdate and the ntp startup script.
+ [Impact] 
+ * Hardware clocks are not stepped at boot, which can prevent NTP from ever
+   syncing the clock.
+   Incorrect clocks can cause serious issues in distributed systems.
  
- 1) if-up.d/ntpdate starts.
- 2) if-up.d/ntpdate acquires the lock "/var/lock/ntpdate-ifup".
- 3) if-up.d/ntpdate stops the ntp service [which isn't running anyway].
- 4) if-up.d/ntpdate starts running ntpdate, which bids UDP *.ntp
- 5) /etc/init.d/rc 2 executes "/etc/rc2.d/S20ntp start"
- 6) /etc/init.d/ntp acquires the lock "/var/lock/ntpdate".
- 7) /etc/init.d/ntp starts the ntp daemon.
- 8) The ntp daemon logs an error, complaining that it cannot bind UDP *.ntp.
- 9) if-up.d/ntpdate now starts the ntp service.
+ * Upstream originally added a lock file to eliminate a race between the ntp
+   service (which keeps the clock synchronized during normal operation) and
+   ntpdate (which is used to step the clock by large intervals at boot time).
+   That change had a flaw which introduced a deadlock. An Ubuntu patch was
+   applied which broke the locking mechanism entirely, reintroducing the race
+   condition.
  
- The result is a weird churn, though ntpd does end up running at the end.
+ * This change undoes the Ubuntu patch and fixes the deadlock by unlocking
+   before attempting to start the ntp service.
  
- Should these not be using the same lock file?
+ [Test Case]
+ 
+ * There are two bugs: The race, and the deadlock. To reproduce the race more
+   consistently:
+   - add 'sleep 30' to '/etc/network/if-up.d/ntpdate' on the line preceding
+     '/usr/sbin/ntpdate-debian -s $OPTS 2>/dev/null || :', and comment out
+     'invoke-rc.d --quiet $service stop >/dev/null 2>&1 || true'. This will
+     reproduce the case where the ntp service starts between the stop command
+     and the ntpdate command.
+     The result will be that the ntpdate command fails. There will be a
+     message in syslog like:
+       'ntpdate[17660]: the NTP socket is in use, exiting'
+   - Reintroducing the lock brings back the deadlock issue. Both the ntpdate
+     if-up.d script and the ntp init script check the lock file, but the
+     ntpdate script attempted to start the ntp init script before unlocking
+     the lock. Moving the unlock before the init script invocation fixes
+     the deadlock. The original deadlock behavior is described here:
+       https://bugs.launchpad.net/ubuntu/+source/ntp/+bug/246203
+ 
+ [Regression Potential]
+ 
+ * Low. Out-of-sync clocks could be changed a large amount at boot time, but
+   only for machines with static IP's. The clock is only likely to be in this
+   state if the clock was very skewed at boot time, which is also unlikely
+   since NTP usually keeps the software clock in sync during operation and
+   the hardware clock is updated at shutdown.

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to ntp in Ubuntu.
https://bugs.launchpad.net/bugs/1125726

Title:
  boot-time race between /etc/network/if-up.d/ntpdate and
  "/etc/init.d/ntp start"

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/ntp/+bug/1125726/+subscriptions

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs

Reply via email to