Which version of Kudu do you use? I also encountered this error a few days ago using Kudu1.3.0: "Tried to update clock beyond the max. error.", after restarting the cluster, everything goes normal. I checked the dmesg and asked SRE to check the ntp service, everything looks normal. Still have no idea what cause that error.
2017-11-01 10:12 GMT+08:00 Franco Venturi <fvent...@comcast.net>: > A few days ago at work our Kudu servers started having fatal errors and > shutting down with the following error message: > > > Couldn't get the current time: Clock unsynchronized. Status: Service > unavailable: Error: Clock synchronized but error wastoo high (10000016 us). > > > After some research in the community forums, I found a post by Todd that > pointed to this JIRA issue: https://issues.apache.org/ > jira/browse/KUDU-2079 > > I then checked our ntpd configuration and sure enough we had the '-x' > option in the daemon command, so I went ahead, removed that option, > restarted ntpd, and a few minutes later I restarted all the Kudu processes > (one master and three tablet servers). > A few minutes later a couple of those Kudu processes were down again, this > time with this new time sync related error message: > > > Tried to update clock beyond the max. error. > > > To try to address this new error, I brought down all the Kudu processes, > stopped ntpd, resync'd the time on all the servers with ntpdate, brought > ntpd back up, waited a bit, and restarted Kudu (master and tablet servers). > A few minutes or less later a couple of them were down again with the same > 'Tried to update clock beyond the max. error.' > > > I eventually ended up doubling the parameter 'max_clock_sync_error_usec' > to 20,000,000 (20 seconds) and everything stayed up (and is still up). > > > Looking at the source code in git, I found the relevant section here > (source file https://github.com/apache/kudu/blob/master/src/kudu/ > clock/hybrid_clock.cc): > > > // we won't update our clock if to_update is more than > 'max_clock_sync_error_usec' > // into the future as it might have been corrupted or originated from > an out-of-sync > // server. > if ((to_update_physical - now_physical) > > FLAGS_max_clock_sync_error_usec) > { > return Status::InvalidArgument("Tried to update clock beyond the > max. error."); > } > > > If I understand this code correctly, it is complaining because for some > reason Kudu is trying to update its clock by more than 10 seconds - however > I ran ntptime and several ntpq queries, and I don't see the time between > the servers being off by that much (or even by say half a second, since > they are all synchronized with a stratum 3 NTP server). > > > Has anyone in this group seen anything similar or does anyone have a > better understanding of what this message means and what could be causing > it? > > > Thanks, > Franco >