Fellow nuts, I'd like to report my experience building and characterizing an NTP client on the ESP32. My client was able to synchronize to UTC using an Internet time server about 4ms away with the following accuracy in terms of absolute error:
Median: 216us 95th percentile: 801us 99th percentile: 1,607us 99.9th percentile: 2,461us --Background-- The ESP32 is a family of very low-cost but highly capable microcontroller chips that have become popular with hobbyists. You can buy a complete module on Amazon -- dual-core, 160mhz, with built-in WiFi and bluetooth, plus a USB interface for programming -- for about $7. The previous-generation ESP8266 is sold for the even more absurd price of about $4! Lately I've been interested in trying to collect sensor data with an ESP32 and report it to a server using the chip's built-in wifi. Of course, I'd like the sensor readings to be properly timestamped. Although there are a few NTP libraries available for the ESP32, none that I found had reports on their performance. In addition, they all did "one-shot" synchronization, i.e., at one point in time, determining the current offset of the local clock to the NTP timescale. They did not attempt to keep a history of observations and do any outlier rejection or correct for local clock rate error. I was convinced I could do better. Searching the archives of this list, I have not seen much discussion of the ESP32 other than speculation a few years ago that one might be used as an NTP server by attaching a GPS to it. --Implementation-- I wrote an NTP client for the ESP32 ( https://github.com/jelson/rulos/tree/main/src/lib/chip/esp32/periph/ntp) that periodically sends a request to an NTP server and keeps the history of recent observations of the offset between the local clock and the NTP server's clock. The one-way latency is assumed to be half the round trip time, minus the server's reported processing delay and a calibration constant described below. When first starting up, it sends a request every 4 seconds until it has 10 responses. Then, in the steady-state, it sends a request every 60 seconds, keeping a history of the past 30 minutes of observations. Each time a new observation is received, it performs simple least-squares linear regression on all the observations to construct a linear model relating the local clock to the NTP timescale. The user interface to my library is simple -- "get current epoch timestamp" -- which works by reading the local clock and applying the linear model to predict the corresponding NTP timestamp. Linear regression has two good effects: first, it averages away noise in individual observations (assuming they are uncorrelated and unbiased). Second, it models the rate difference between the local clock and the NTP timescale, which significantly improves the accuracy even if time has elapsed since the most recent observation. --Reducing Timestamp Jitter-- The ESP32 network stack is built on a lightweight TCP/IP stack called "LWIP". It has an API that looks much like the Berkeley socket API, e.g. sendto() and recvfrom() to send and receive UDP packets. Unfortunately, LWIP has chosen not to propagate interrupt-time reception timestamps up to the application as metadata. Early revisions of my NTP client did the same thing that all the other ESP32 NTP clients do: record the time before sending a UDP packet, and again when we read the data out of LWIP's buffers using recv(). This leads to high jitter (on the scale of 10s of ms), as timestamp acquisition is delayed behind all packet processing, which itself is subject to the whims of the ESP32's FreeRTOS scheduler. However, the ESP32 does offer a shortcut: the WiFi driver has "promiscuous mode" in which user programs can receive a callback directly from the WiFi driver whenever a packet is received. This is meant for applications such as WiFi sniffing, but it has the nice side effect of giving me a low-jitter event to record packet reception times. In fact, the promiscuous mode callback contains wifi metadata including a reception timestamp acquired directly *in the wifi hardware*. Using that hardware timestamp would require some extra work to relate the wifi clock to the system clock, but could be another source of jitter reduction that I have not implemented. There seems to be no similar way to get precise send-time timestamps. --High RTT Filter-- My tests were all against a public Stratum 2 NTP server run by the University of Washington. From my home in downtown Seattle, the RTT to that server from a Wifi-connected laptop is about 4ms. However, despite using promiscuous mode to reduce jitter on the receive timestamps, my client experienced a long tail of jitter in the RTTs, as seen in the graph below: https://www.lectrobox.com/projects/esp32-ntp/graphs/calibrated-wifi-interrupts-long.log.rtt_hist.png Since my laptop does not see this jitter, it is likely coming from some part of the ESP32's network stack or WiFi hardware. Before using promiscuous mode for receive-side timestamps, the distribution of errors was symmetrical; after adding promiscuous mode, the distribution became more one-sided. This leads me to conclude that the remaining jitter is primarily variability on the sending side, i.e. ,the time it takes the LWIP network stack and underlying WiFi hardware to send a packet. Of course, these sorts of delays contribute directly to error. As a simple filter, my client discards observations in the top quartile of RTTs before feeding the remainder into the linear regression algorithm. --Latency Asymmetry Correction-- As people familiar with NTP know, its accuracy depends on symmetry in the forward and reverse path latencies. Only the round-trip-time can be measured directly, so we infer that the one-way latency is half the RTT. This assumption is only true if the forward and reverse latencies are equal, and any asymmetry in those two latencies contributes directly to error. This more-or-less works on the Internet, and though sometimes packet queues will increase the latency in one direction or the other, in the long-term it's often an unbiased estimate. My early testing indicated that the latency asymmetry on the ESP32 was biased. That is, when I compared the NTP-acquired timestamp to ground truth in a long-term test (as described in the next section) I found the errors were not centered around zero, but were biased in one direction with a median of about 200us, as seen in the histogram below: https://www.lectrobox.com/projects/esp32-ntp/graphs/zerocal-wifi-interrupts-2.log.gps_vs_ntp.hist.png I suspect this is a difference in the time required to send a packet through the ESP32's stack vs the time required to receive it. Note in the histogram that the errors are highly asymmetrical; this is because there's high jitter in the estimate of when we send a packet, but very little in the reception timestamps. I chose to calibrate this asymmetry out, by subtracting 200us from the latency estimate as a "hard-coded constant". --Evaluation-- I evaluated my NTP client by using it to timestamp the PPS output of a uBlox M10 GNSS. My antenna has a view of about 180 degrees of sky. The uBlox is configured to listen to the GPS, Galileo and GLONASS constellations, and during most of this test was using about 20 satellites for its solution. Once per second, at the top of each UTC second, the GNSS generates a pulse, which I wired into one of the ESP32's GPIO pins. A GPIO interrupt handler on the ESP32 gets the current timestamp from the my NTP service and emits it to the serial port. I record the results and compute the error in the reported timestamp, e.g., a reported timestamp of 1644310833.999917 has an error of 83 microseconds. During my tests, the ESP32 sent an NTP request to the University of Washington's public Stratum 2 NTP server over the internet every 60 seconds. The nominal RTT to that server from my home via WiFi is about 4ms. I collected over 100,000 data points. This graph shows time-series data of the errors over the course of the experiment: https://www.lectrobox.com/projects/esp32-ntp/graphs/calibrated-wifi-interrupts-long.log.gps_vs_ntp.timeseries.png Here is a histogram of the absolute errors recorded, excluding the first ten minutes of warmup. The median is close to 0, thanks to the calibration constant I mentioned in the previous section. Also note that the long tail of errors is one-sided (other than some outliers). As described earlier, this is because I can acquire low-jitter reception timestamps but can not do so for send-time timestamps. https://www.lectrobox.com/projects/esp32-ntp/graphs/calibrated-wifi-interrupts-long.log.gps_vs_ntp.hist.log.png Numerically, abs(error) can be summarized as: Median: 216us 95th percentile: 801us 99th percentile: 1,607us 99.9th percentile: 2,461us --Drawbacks-- My implementation has several drawbacks compared to a real NTP client. Perhaps foremost is that it makes no effort to create a monotonic timescale. Every time a new observation arrives, a new linear regression is performed, and all time queries immediately start using that new model. This can result in a discontinuity in the timescale, e.g., two sensor observations collected 10ms apart might have timestamps that differ by significantly more than 10ms. This is in contrast to the reference NTP client which gradually slews the clock with the goal of keeping the local timescale continuous and keeps the instantaneous rate close to SI seconds, even while adjustments are happening. The ESP32 does support an adjtime() call so a continuous timescale should be possible. Another drawback is that my outlier rejection is not very sophisticated, and least-squares fitting is very sensitive to big outliers. The next step, if I ever return to this project, would be an iterative algorithm that draws a linear regression line, looks for outliers, discards them, and redraws the regression line with the remaining points. --Summary-- Despite being cheap hardware, and having a slow network stack that suffers from high jitter, it's possible for an ESP32 to get a view of UTC typically accurate to under 1 millisecond using only an Internet NTP server accessed using its built-in WiFi. Though worse than the performance one would expect from a desktop computer, it's in the same ballpark, and could be improved by investing more time into better outlier rejection. Regards, -Jeremy N3UUO _______________________________________________ time-nuts mailing list -- time-nuts@lists.febo.com -- To unsubscribe send an email to time-nuts-le...@lists.febo.com To unsubscribe, go to and follow the instructions there.