Hi,

I remember Ben had opened a jira for clock jumps earlier:
https://issues.apache.org/jira/browse/ZOOKEEPER-366. It is not uncommon to
have clocks jump forward in virtualized environments.

It is desirable to modify ZooKeeper to handle this situation (as much as
possible) internally. It would need to be done for both client - server
connections and server - server connections. One obvious solution is to
retry a few times (send ping) after getting a timeout. Another way is to
count the number of pings that have been sent after receiving the timeout.
If number of pings do not match the expected number (say 5 ping attempt
should be finished for a 5 sec timeout), then wait till all the pings are
finished. In effect do not completely rely on the clock. Any comments?

-Vishal

On Thu, Aug 19, 2010 at 3:52 AM, Qing Yan <qing...@gmail.com> wrote:

> Oh.. our servers are also running in a virtualized environment.
>
> On Thu, Aug 19, 2010 at 2:58 PM, Martin Waite <waite....@gmail.com> wrote:
>
> > Hi,
> >
> > I have tripped over similar problems testing Red Hat Cluster in
> virtualised
> > environments.  I don't know whether recent linux kernels have improved
> > their
> > interaction with VMWare, but in our environments clock drift caused by
> lost
> > ticks can be substantial, requiring NTP to sometimes jump the clock
> rather
> > than control acceleration.   In one of our internal production rigs, the
> > local NTP servers themselves were virtualised - causing absolute mayhem
> > when
> > heavy loads hit the other guests on the same physical hosts.
> >
> > The effect on RHCS (v2.0) is quite dramatic.  A forward jump in time by
> 10
> > seconds always causes a member to prematurely time-out on a network read,
> > causing the member to drop out and trigger a cluster reconfiguration.
> > Apparently NTP is integrated with RHCS version 3, but I don't know what
> is
> > meant by that.
> >
> > I guess this post is not entirely relevent to ZK, but I am just making
> the
> > point that virtualisation (of NTP servers and or clients) can cause
> > repeated
> > premature timeouts.  On Linux, I believe that there is a class of timers
> > provided that is immune to this, but I doubt that there is a platform
> > independent way of coping with this.
> >
> > My 2p.
> >
> > regards,
> > Martin
> >
> > On 18 August 2010 16:53, Patrick Hunt <ph...@apache.org> wrote:
> >
> > > Do you expect the time to be "wrong" frequently? If ntp is running it
> > > should never get out of sync more than a small amount. As long as this
> is
> > > less than ~your timeout you should be fine.
> > >
> > > Patrick
> > >
> > >
> > > On 08/18/2010 01:04 AM, Qing Yan wrote:
> > >
> > >> Hi,
> > >>
> > >>    The testcase is fairly simple. We have a client which connects to
> ZK,
> > >> registers an ephemeral node and watches on it. Now change the client
> > >> machine's time - session killed..
> > >>
> > >>    Here is the log:
> > >>
> > >> *2010-08-18 04:24:57,782 INFO
> > >> com.taobao.timetunnel2.cluster.service.AgentService: Host name
> > >> kgbtest1.corp.alimama.com
> > >> 2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: Client
> > >> environment:zookeeper.version=3.2.2-888565, built on 12/08/2009 21:51
> > GMT
> > >> 2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: Client
> > >> environment:host.name=kgbtest1.corp.alimama.com
> > >> 2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: Client
> > >> environment:java.version=1.6.0_13
> > >> 2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: Client
> > >> environment:java.vendor=Sun Microsystems Inc.
> > >> 2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: Client
> > >> environment:java.home=/usr/java/jdk1.6.0_13/jre
> > >> 2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: Client
> > >>
> > >>
> >
> environment:java.class.path=/home/admin/TimeTunnel2/cluster/bin/../conf/agent/:/home/admin/TimeTunnel2/cluster/bin/../lib/slf4j-log4j12-1.5.2.jar:/home/admin/TimeTunnel2/cluster/bin/../lib/slf4j-api-1.5.2.jar:/home/admin/TimeTunnel2/cluster/bin/../lib/timetunnel2-cluster-0.0.1-SNAPSHOT.jar:/home/admin/TimeTunnel2/cluster/bin/../lib/zookeeper-3.2.2.jar:/home/admin/TimeTunnel2/cluster/bin/../lib/log4j-1.2.14.jar:/home/admin/TimeTunnel2/cluster/bin/../lib/gson-1.4.jar:/home/admin/TimeTunnel2/cluster/bin/../lib/zk-recipes.jar
> > >> 2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: Client
> > >>
> > >>
> >
> environment:java.library.path=/usr/java/jdk1.6.0_13/jre/lib/amd64/server:/usr/java/jdk1.6.0_13/jre/lib/amd64:/usr/java/jdk1.6.0_13/jre/../lib/amd64:/usr/java/packages/lib/amd64:/lib:/usr/lib
> > >> 2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: Client
> > >> environment:java.io.tmpdir=/tmp
> > >> 2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: Client
> > >> environment:java.compiler=<NA>
> > >> 2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: Client
> > >> environment:os.name=Linux
> > >> 2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: Client
> > >> environment:os.arch=amd64
> > >> 2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: Client
> > >> environment:os.version=2.6.18-164.el5
> > >> 2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: Client
> > >> environment:user.name=admin
> > >> 2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: Client
> > >> environment:user.home=/home/admin
> > >> 2010-08-18 04:24:57,789 INFO org.apache.zookeeper.ZooKeeper: Client
> > >> environment:user.dir=/home/admin/TimeTunnel2/cluster/log
> > >> 2010-08-18 04:24:57,790 INFO org.apache.zookeeper.ZooKeeper:
> Initiating
> > >> client connection, connectString=xentest10-vm5.corp.alimama.com:2181,
> > >> xentest10-vm6.corp.alimama.com:2181,
> xentest10-vm9.corp.alimama.com:2181
> > >> sessionTimeout=600000
> > >> watcher=com.taobao.timetunnel2.cluster.service.agentserv...@48d6c16c
> > >> 2010-08-18 04:24:57,791 INFO org.apache.zookeeper.ClientCnxn:
> > >> zookeeper.disableAutoWatchReset is false
> > >> 2010-08-18 04:24:57,817 INFO org.apache.zookeeper.ClientCnxn:
> Attempting
> > >> connection to server xentest10-vm5.corp.alimama.com/10.254.5.160:2181
> > >> 2010-08-18 04:24:57,818 INFO org.apache.zookeeper.ClientCnxn: Priming
> > >> connection to java.nio.channels.SocketChannel[connected local=/
> > >> 10.254.2.160:48536 remote=
> > >> xentest10-vm5.corp.alimama.com/10.254.5.160:2181]
> > >> 2010-08-18 04:24:57,821 INFO org.apache.zookeeper.ClientCnxn: Server
> > >> connection successful
> > >> 2010-08-18 04:24:57,832 INFO
> > >> com.taobao.timetunnel2.cluster.service.AgentService: Register...
> > >> 2010-08-18 04:24:57,834 DEBUG org.apache.zookeeper.ClientCnxn: Reading
> > >> reply
> > >> sessionid:0x12a8413f71a0003, packet:: clientPath:null serverPath:null
> > >> finished:false header:: -8,101  replyHeader:: -8,-1,0  request::
> > >> 0,v{},v{},v{}  response:: null
> > >> 2010-08-18 04:24:57,846 DEBUG org.apache.zookeeper.ClientCnxn: Reading
> > >> reply
> > >> sessionid:0x12a8413f71a0003, packet:: clientPath:null serverPath:null
> > >> finished:false header:: 1,1  replyHeader:: 1,8589934649,-110
>  request::
> > >> '/agent,,v{s{31,s{'world,'anyone}}},0  response::
> > >> 2010-08-18 04:24:57,853 DEBUG
> > >> com.taobao.timetunnel2.cluster.zookeeper.operation.PathCreator: Path
> > >> /agent
> > >> already exist
> > >> 2010-08-18 04:24:57,882 DEBUG org.apache.zookeeper.ClientCnxn: Reading
> > >> reply
> > >> sessionid:0x12a8413f71a0003, packet:: clientPath:null serverPath:null
> > >> finished:false header:: 2,1  replyHeader:: 2,8589934650,-110
>  request::
> > >> '/agent/status,,v{s{31,s{'world,'anyone}}},0  response::
> > >> 2010-08-18 04:24:57,882 DEBUG
> > >> com.taobao.timetunnel2.cluster.zookeeper.operation.PathCreator: Path
> > >> /agent/status already exist
> > >> 2010-08-18 04:24:57,886 DEBUG org.apache.zookeeper.ClientCnxn: Reading
> > >> reply
> > >> sessionid:0x12a8413f71a0003, packet:: clientPath:null serverPath:null
> > >> finished:false header:: 3,1  replyHeader:: 3,8589934651,0  request::
> > >> '/agent/status/kgbtest1.corp.alimama.com
> ,,v{s{31,s{'world,'anyone}}},1
> > >> response:: '/agent/status/kgbtest1.corp.alimama.com
> > >> 2010-08-18 04:24:57,886 DEBUG
> > >> com.taobao.timetunnel2.cluster.zookeeper.operation.PathCreator: Path
> > >> /agent/status/kgbtest1.corp.alimama.com creation OK
> > >> 2010-08-18 04:24:57,886 INFO
> > >> com.taobao.timetunnel2.cluster.service.AgentService: Watch for ->
> > >> /agent/status/kgbtest1.corp.alimama.com
> > >> 2010-08-18 04:24:57,892 DEBUG org.apache.zookeeper.ClientCnxn: Reading
> > >> reply
> > >> sessionid:0x12a8413f71a0003, packet:: clientPath:null serverPath:null
> > >> finished:false header:: 4,4  replyHeader:: 4,-1,0  request::
> > >> '/agent/status/
> > >> kgbtest1.corp.alimama.com,T  response::
> > >>
> > >>
> >
> ,s{8589934651,8589934651,1282116165777,1282116165777,0,0,0,84024764344696835,0,0,8589934651}
> > >>
> > >> 2010-08-18 04:24:57,892 INFO
> > >> com.taobao.timetunnel2.cluster.service.AgentService: Data changed
> > >> @/agent/status/kgbtest1.corp.alimama.com
> > >> 2010-08-18 04:25:11,226 DEBUG org.apache.zookeeper.ClientCnxn: Got
> ping
> > >> response for sessionid:0x12a8413f71a0003 after 1ms
> > >> 2010-08-18 04:25:24,560 DEBUG org.apache.zookeeper.ClientCnxn: Got
> ping
> > >> response for sessionid:0x12a8413f71a0003 after 2ms
> > >> 2010-08-18 04:25:37,893 DEBUG org.apache.zookeeper.ClientCnxn: Got
> ping
> > >> response for sessionid:0x12a8413f71a0003 after 1ms
> > >> 2010-08-18 04:19:44,853 DEBUG org.apache.zookeeper.ClientCnxn: Got
> > >> notification sessionid:0x12a8413f71a0003
> > >> 2010-08-18 04:19:44,854 DEBUG org.apache.zookeeper.ClientCnxn: Got
> > >> WatchedEvent: Znode change. Path:
> > >> /agent/status/kgbtest1.corp.alimama.comType: NodeDeleted for sessionid
> > >> 0x12a8413f71a0003
> > >> 2010-08-18 04:19:44,854 DEBUG
> > >> com.taobao.timetunnel2.cluster.zookeeper.operation.PathDataWatcher:
> > >> getChildren event =>WatchedEvent: Znode change. Path: /agent/status/
> > >> kgbtest1.corp.alimama.com Type: NodeDeleted
> > >> 2010-08-18 04:19:44,855 WARN org.apache.zookeeper.ClientCnxn:
> Exception
> > >> closing session 0x12a8413f71a0003 to
> > sun.nio.ch.selectionkeyi...@7ca3d4cf
> > >> java.io.IOException: Read error rc = -1
> java.nio.DirectByteBuffer[pos=0
> > >> lim=4 cap=4]
> > >>     at
> > >>
> > >>
> >
> org.apache.zookeeper.ClientCnxn$SendThread.doIO(zookeeper:ClientCnxn.java):701)
> > >>     at
> > >>
> > >>
> >
> org.apache.zookeeper.ClientCnxn$SendThread.run(zookeeper:ClientCnxn.java):945)
> > >> 2010-08-18 04:19:45,591 INFO org.apache.zookeeper.ClientCnxn:
> Attempting
> > >> connection to server xentest10-vm6.corp.alimama.com/10.254.5.161:2181
> > >> 2010-08-18 04:19:45,591 INFO org.apache.zookeeper.ClientCnxn: Priming
> > >> connection to java.nio.channels.SocketChannel[connected local=/
> > >> 10.254.2.160:45776 remote=
> > >> xentest10-vm6.corp.alimama.com/10.254.5.161:2181]
> > >> 2010-08-18 04:19:45,591 INFO org.apache.zookeeper.ClientCnxn: Server
> > >> connection successful
> > >> 2010-08-18 04:19:45,593 WARN org.apache.zookeeper.ClientCnxn:
> Exception
> > >> closing session 0x12a8413f71a0003 to
> > sun.nio.ch.selectionkeyi...@2fdb7df8
> > >> java.io.IOException: Session Expired
> > >>     at
> > >>
> > >>
> >
> org.apache.zookeeper.ClientCnxn$SendThread.readConnectResult(zookeeper:ClientCnxn.java):589)
> > >>     at
> > >>
> > >>
> >
> org.apache.zookeeper.ClientCnxn$SendThread.doIO(zookeeper:ClientCnxn.java):709)
> > >>     at
> > >>
> > >>
> >
> org.apache.zookeeper.ClientCnxn$SendThread.run(zookeeper:ClientCnxn.java):945)
> > >>
> > >> *   I guess ZK might rely on timestamp to  keep sessions alive, but we
> > >> have
> > >> NTP daemon running so machine time can get changed
> > >> automatically, is there a conflict?
> > >>
> > >>
> >
>

Reply via email to