Public bug reported:

We observe NTP time drift on two servers running hwe kernels in Xenial.
A few weeks ago we wanted to switch from 4.4 to 4.10. When rebooting the
servers to the 4.10 kernel we were seeing a big time offset within
minutes after booting. Despite running ntpd, it would not keep up and
the offset stayed and kept growing over t.

Rebooting back into the 4.4 at the time we immediatly noticed the time
stayed normal. Over time I have tested about a dozen versions making me
think something has been introduced in kernel 4.10 that makes the clock
go out of sync.

So what do we observe?

After 1 min uptime:
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
*ntp4.bit.nl     .PPS.            1 u    5   16    7    0.497  100.084  81.382
+ntp1.bit.nl     193.0.0.229      2 u    8   16    7    0.603   93.241  70.643
+ntp2.bit.nl     193.67.79.202    2 u    8   16    7    0.582   93.218  70.674
+ntp3.bit.nl     193.79.237.14    2 u    9   16    7    0.781   90.488  70.574

A couple of minutes later (and also hours/days, the offset just keeps
growing over time)

     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
*ntp4.bit.nl     .PPS.            1 u   13   16  377    0.447  400.198 151.335
+ntp1.bit.nl     193.0.0.229      2 u   13   16  377    0.313  400.561 151.339
+ntp2.bit.nl     193.67.79.202    2 u   13   16  377    0.517  400.445 151.398
+ntp3.bit.nl     193.79.237.14    2 u   12   16  377    0.934  402.013 151.384

As mentioned I tested about a dozen of kernels and I thought I got it
pinpointed to a specific release when the drifting got introduced,
4.10rc1. Below the test results of the kernels I have tested up till
today:

Tested: 4.4.0-112-generic: not affected
Tested: 4.8.0-41-generic: not affected
Tested: 4.8.0-58-generic : not affected
Tested: 4.9.0 mainline: not affected
Tested: 4.9.66 mainline: not affected
Tested: 4.10-rc1 mainline: affected
Tested: 4.10 mainline: affected
Tested: 4.10.0-38-generic: affected
Tested: 4.10.0-40-generic: affected
Tested: 4.13.0-16-generic: affected
Tested: 4.13.0-31-generic: affected
Tested: 4.14.3 mainline: affected
Tested: 4.15-rc1 mainline: affected

When I was about to file this bugreport about an hour ago I noticed
4.15-rc9 was present and thought I gave it a go to make sure I really
tested the latest version. And while running it over an hour now it
stable.

Mostl likely the following from the changelog is related the issue we
are having:

Len Brown (3):
      x86/tsc: Future-proof native_calibrate_tsc()
      x86/tsc: Fix erroneous TSC rate on Skylake Xeon
      x86/tsc: Print tsc_khz, when it differs from cpu_khz

Both servers that are having issues on our side our equipped with the
following cpu:

Cpu Model (from /proc/cpuinfo)
vendor_id       : GenuineIntel
cpu family      : 6
model           : 85
model name      : Intel(R) Xeon(R) Gold 6136 CPU @ 3.00GHz

Standard information as requested:
1:
Description:    Ubuntu 16.04.3 LTS
Release:        16.04

2: 
root@bit-host6:~# apt-cache policy linux-image-generic-hwe-16.04
linux-image-generic-hwe-16.04:
  Installed: 4.13.0.31.51
  Candidate: 4.13.0.31.51

3: Stable time

4: A big time offset

** Affects: linux-hwe (Ubuntu)
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1744988

Title:
  time drifting on linux-hwe kernels

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-hwe/+bug/1744988/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to