Re: Session expiration caused by time change

2010-08-20 Thread Martin Waite
Hi, In our testing of Red Hat Cluster, we could reproduce the NTP impact by jumping the clock backwards and forwards, just using the date command in a tight-ish loop: use strict; my $dir = 1; while (1) { jump_time( $dir ); $dir = $dir * -1; } sub jump_time { my ($dir) = @_; my

Re: Session expiration caused by time change

2010-08-20 Thread Benjamin Reed
i put up a patch that should address the problem. now i need to write a test case. the only way i can think of is to change the call to System.currentTimeMillis to a utility class that calls System.currentTimeMillis that i can mock for testing. any better ideas? ben On 08/19/2010 03:53 PM,

Re: Session expiration caused by time change

2010-08-20 Thread Ted Dunning
Mocking the time via a utility was my thought. Mocking system itself is scary. Sent from my iPhone On Aug 20, 2010, at 1:18 PM, Benjamin Reed br...@yahoo-inc.com wrote: i put up a patch that should address the problem. now i need to write a test case. the only way i can think of is to

Re: Session expiration caused by time change

2010-08-19 Thread Ted Dunning
You can always increase your timeouts a bit. On Thu, Aug 19, 2010 at 12:52 AM, Qing Yan qing...@gmail.com wrote: Oh.. our servers are also running in a virtualized environment. On Thu, Aug 19, 2010 at 2:58 PM, Martin Waite waite@gmail.com wrote: Hi, I have tripped over similar

Re: Session expiration caused by time change

2010-08-19 Thread Vishal K
Hi, I remember Ben had opened a jira for clock jumps earlier: https://issues.apache.org/jira/browse/ZOOKEEPER-366. It is not uncommon to have clocks jump forward in virtualized environments. It is desirable to modify ZooKeeper to handle this situation (as much as possible) internally. It would

Re: Session expiration caused by time change

2010-08-19 Thread Ted Dunning
Another option would be for the cluster to compare times and note when one member seems to be lagging. Restoration of that lag would then be less remarkable. I believe that the pattern of these problems is a slow slippage behind and a sudden jump forward. On Thu, Aug 19, 2010 at 7:51 AM, Vishal

Re: Session expiration caused by time change

2010-08-19 Thread Martin Waite
Hi, I'm not sure if you mean the timers I was on about earlier. If so, http://linux.die.net/man/3/clock_gettime Sufficiently recent versions of GNU libc and the Linux kernel support the following clocks: ... *CLOCK_MONOTONIC* Clock that cannot be set and represents monotonic time since some

Re: Session expiration caused by time change

2010-08-19 Thread Ted Dunning
True. But it knows that there has been a jump. Quiet time can be distinguished from clock shift by assuming that members of the cluster don't all jump at the same time. I would imagine that a recent clock jump estimate could be kept and buckets that would otherwise expire due to such a jump

Re: Session expiration caused by time change

2010-08-19 Thread Benjamin Reed
yes, you are right. we could do this. it turns out that the expiration code is very simple: while (running) { currentTime = System.currentTimeMillis(); if (nextExpirationTime currentTime) { this.wait(nextExpirationTime -

Re: Session expiration caused by time change

2010-08-19 Thread Ted Dunning
Nice (modulo inverting the in your text). Option 2 seems very simple. That always attracts me. On Thu, Aug 19, 2010 at 9:19 AM, Benjamin Reed br...@yahoo-inc.com wrote: yes, you are right. we could do this. it turns out that the expiration code is very simple: while (running) {

Re: Session expiration caused by time change

2010-08-19 Thread Vishal K
Hi Ted, I haven't give it a serious thought yet, but I don't think it is neccessary for the cluster to keep track of time. A node can make its own decision. For the sake of argument, lets say that we have a client and a server with following policy: 1. Client is supposed to send a ping to server

Re: Session expiration caused by time change

2010-08-19 Thread Benjamin Reed
if we can't rely on the clock, we cannot say things like if ... for 5 seconds. also, clients connect to servers, not visa-versa, so we cannot say things like server can attempt to reconnect. ben On 08/19/2010 10:17 AM, Vishal K wrote: Hi Ted, I haven't give it a serious thought yet, but I

Re: Session expiration caused by time change

2010-08-19 Thread Vishal K
Hi Ben, Comments inline.. On Thu, Aug 19, 2010 at 5:33 PM, Benjamin Reed br...@yahoo-inc.com wrote: if we can't rely on the clock, we cannot say things like if ... for 5 seconds. if ... for 5 seconds indicates the timeout give by the socket library. After the timeout we can verify that the

Re: Session expiration caused by time change

2010-08-19 Thread Ted Dunning
Ben's approach is really simpler. The client already sends keep-alive messages and we know that some have gone missing or a time shift has happened. Those two possibilities are cleanly distinguished by Ben's suggestion of comparing current time to the bucket expiration. If current time is

Re: Session expiration caused by time change

2010-08-19 Thread Benjamin Reed
i'm updating ZOOKEEPER-366 with this discussion and try to get a patch out. Qing (or anyone else, can you reproduce it pretty easily?) thanx ben On 08/19/2010 09:29 AM, Ted Dunning wrote: Nice (modulo inverting the in your text). Option 2 seems very simple. That always attracts me. On

Re: Session expiration caused by time change

2010-08-19 Thread Ted Dunning
Put in a four letter command that will put the server to sleep for 15 seconds! :-) On Thu, Aug 19, 2010 at 3:51 PM, Benjamin Reed br...@yahoo-inc.com wrote: i'm updating ZOOKEEPER-366 with this discussion and try to get a patch out. Qing (or anyone else, can you reproduce it pretty easily?)

Session expiration caused by time change

2010-08-18 Thread Qing Yan
Hi, The testcase is fairly simple. We have a client which connects to ZK, registers an ephemeral node and watches on it. Now change the client machine's time - session killed.. Here is the log: *2010-08-18 04:24:57,782 INFO com.taobao.timetunnel2.cluster.service.AgentService: Host name

Re: Session expiration caused by time change

2010-08-18 Thread Ted Dunning
If NTP is changing your time by more than a few milliseconds then you have other problems (big ones). On Wed, Aug 18, 2010 at 1:04 AM, Qing Yan qing...@gmail.com wrote: I guess ZK might rely on timestamp to keep sessions alive, but we have NTP daemon running so machine time can get changed