Easy enough to try out. Give it a shot and enter a jira if you find an issue.
Regards, Patrick On Thu, Dec 7, 2017 at 5:47 AM, Jordan Zimmerman <jor...@jordanzimmerman.com > wrote: > System.nanoTime() is not affected by clock changes. Really everyone - this > is simply not an issue in ZooKeeper. > > ==================== > Jordan Zimmerman > > > On Dec 7, 2017, at 7:43 AM, Kathryn Hogg <kathryn.h...@oati.net> wrote: > > > > I'm pretty new to zookeeper but have a fair amount of experience with > virtual synchrony going back many years. Even though time is relative, it > is possible that if the clock suddenly jumps forward on the server to > prematurely declare timeouts as expired. I'm not sure how Zookeeper > handles that but in Isis, if 2 consecutive calls to gettimeofday had too > large of a difference, it considered it fishy. > > > > Of course, this is why we use ntp with adjtime to avoid clocks going > backwards or making large jumps forward. > > > > -----Original Message----- > > From: Patrick Hunt [mailto:ph...@apache.org] > > Sent: Wednesday, December 06, 2017 5:18 PM > > To: UserZooKeeper <user@zookeeper.apache.org> > > Subject: Re: Zookeeper session expiration > > > > {External email message: This email is from an external source. Please > exercise caution prior to opening attachments, clicking on links, or > providing any sensitive information.} > > > > What Jordan said + time use is only in the relative sense, not the > absolute. Session tracking (expiration) is relative to the start of > leadership. > > > > Patrick > > > >> On Mon, Dec 4, 2017 at 12:21 PM, Jordan Zimmerman < > jor...@jordanzimmerman.com> wrote: > >> > >> ZooKeeper, indeed, does not use wall clock time. It uses > >> System.nanoTime() for most operations. Further, all operations go > >> through the Leader node so only the Leader's notion of time matters. > >> The Leader manages the session via a "SessionTracker" instance. The > code is in SessionTrackerImpl.java. > >> There is a sessionExpiryQueue which is a kind of priority queue that > >> returns expired sessions based on System.nanoTime(). > >> > >> -JZ > >> > >>> On Dec 4, 2017, at 12:09 PM, Abraham Fine <af...@apache.org> wrote: > >>> > >>> Hello Anthony and Shawn- > >>> > >>> To the best of my knowledge ZooKeeper does not use the "wall clock" > >>> time anywhere. So that should not be the problem. > >>> > >>> Please consider enabling debug logging, which should allow you to > >>> track the "pings". > >>> > >>> Thanks, > >>> Abe > >>> > >>>> On Mon, Dec 4, 2017, at 11:51, Anthony Shaya wrote: > >>>> Thanks Shawn, should I message the developer mailing list for a > >>>> more definitive answer? > >>>> > >>>> Thanks again for the reply. > >>>> > >>>> -----Original Message----- > >>>> From: Shawn Heisey [mailto:apa...@elyograg.org] > >>>> Sent: Monday, December 4, 2017 2:49 PM > >>>> To: user@zookeeper.apache.org > >>>> Subject: Re: Zookeeper session expiration > >>>> > >>>>> On 12/4/2017 8:22 AM, Anthony Shaya wrote: > >>>>> My question is related to how session expiration works, I noticed > >>>>> on > >> many of the client machines the times across these machines were all > >> off (by anywhere from 1 minute to 20 minutes - which was resolved > >> after discovery - haven't verified this completely yet). Can this > >> directly affect session expiration within the zookeeper cluster? > >>>>> > >>>>> * I read the following in https://na01.safelinks. > >> protection.outlook.com/?url=https%3A%2F%2Fwiki.apache.org% > >> 2Fhadoop%2FZooKeeper%2FFAQ&data=02%7C01%7C%7C6d6643860a4e4a8194c808d53 > >> b50 23ec%7Cc61157e903cb47589165ee7845cb0ca3%7C0%7C0% > >> 7C636480137750841475&sdata=RwGGH19FLeYFmXMrg5GBkSLJ65ANj1 > >> EXkTvwyk6OLd4%3D&reserved=0 , "Expirations happens when the cluster > >> does not hear from the client within the specified session timeout > period (i.e. > >> no heartbeat).". So in some case it seems like if the times were wrong > >> across the machines its possible one of the clients could of > >> effectively sent a heart beat in the past (not sure about this tbh) > >> and then the cluster expires the session? > >>>> > >>>> I make these comments without any knowledge of what ZK code > >>>> actually does. I am a member of this list because I'm a > >>>> representative of the Apache Solr project, which uses the ZK client > >>>> in order to maintain a cluster. > >>>> > >>>> IMHO, any software which makes actual decisions based on the > >>>> timestamps in messages from another system is badly designed. I > >>>> would hope that > >> the > >>>> ZK designers know this, and always make any decisions related to > >>>> time using the clock in the local system only. > >>>> > >>>> If ZK's designers did the right thing, then a session timeout would > >>>> indicate that quite literally no heartbeats were received in X > >>>> seconds, as measured by the local clock, and the local clock ONLY > >>>> ... NOT from timestamp information received from another system. > >>>> > >>>> Although such a lack of communication could be caused by any number > >>>> of things, including network hardware failure, one of the most > >>>> common reasons I have seen for problems like this is extreme java > >>>> garbage collection pauses in the client software. > >>>> > >>>> Situations where the heap is a little bit too small can cause a > >>>> java program to basically be doing garbage collection constantly, > >>>> so it doesn't have much time to do anything else, like send > >>>> heartbeats to ZK servers. > >>>> > >>>> Situations where the heap is HUGE and garbage collection is not > >>>> well tuned can lead to pauses of a minute or longer while Java does > >>>> a massive full GC. > >>>> > >>>>> * I don't have the zookeeper node log for the above time to see > >> what was going on in zookeeper when the cluster determined the session > >> expired. > >>>>> > >>>>> * Is there any additional logging I can turn on to troubleshoot zk > >> session expiration issues? > >>>> > >>>> Hopefully your ZK clients also have logging. Failing that, you > >>>> could turn on GC logging for the software with the ZK client > >>>> (assuming it's a Java client) and find a program or website that > >>>> can examine the log and give you statistics or a graph of GC pauses. > >>>> > >>>> If there is a problem in software using the client and whatever > >>>> logging is available doesn't help you figure out what's wrong, > >>>> you're generally going to need to talk to whoever wrote that > >>>> software for help troubleshooting it. > >>>> > >>>> Thanks, > >>>> Shawn > >>>> > >>>> > >>>> > >>>> This message is intended exclusively for the individual or entity > >>>> to which it is addressed. This communication may contain > >>>> information that > >> is > >>>> proprietary, privileged, confidential or otherwise legally exempt > >>>> from disclosure. If you are not the named addressee, or have been > >>>> inadvertently and erroneously referenced in the address line, you > >>>> are > >> not > >>>> authorized to read, print, retain, copy or disseminate this message > >>>> or any part of it. If you have received this message in error, > >>>> please > >> notify > >>>> the sender immediately by e-mail and delete all copies of the message. > >>>> (ID m031214) > >> > >> >