Andrew Wong created KUDU-2906:
---------------------------------

             Summary: Don't allow elections when server clocks are too out of 
sync
                 Key: KUDU-2906
                 URL: https://issues.apache.org/jira/browse/KUDU-2906
             Project: Kudu
          Issue Type: Bug
          Components: consensus
    Affects Versions: 1.10.0
            Reporter: Andrew Wong


In cases where machine clocks are not properly synchronized, if a tablet 
replica is elected leader whose clock happens to be very far in the future 
(greater than --max_clock_sync_error_usec=10 sec), it's possible that any 
writes that goes to that tablet will be rejected by the followers, but 
persisted to the leader's WAL.

Then, upon fixing the clock on that machine, the replica may try to replay the 
future op, but fail to replay it because the op timestamp is too far in the 
future, with errors like:
{code:java}
F0715 12:03:09.369819  3500 tablet_bootstrap.cc:904] Check failed: _s.ok() Bad 
status: Invalid argument: Tried to update clock beyond the max. error.{code}
Dumping a recovery WAL, I could see:
{code:java}
130.138@6400743143334211584 REPLICATE NO_OP
id { term: 130 index: 138 } timestamp: 6400743143334211584 op_type: NO_OP 
noop_request { }
COMMIT 130.138
op_type: NO_OP commited_op_id { term: 130 index: 138 }
131.139@6400743925559676928 REPLICATE NO_OP
id { term: 131 index: 139 } timestamp: 6400743925559676928 op_type: NO_OP 
noop_request { }
COMMIT 131.139
op_type: NO_OP commited_op_id { term: 131 index: 139 }
132.140@11589864471731939930 REPLICATE NO_OP
id { term: 132 index: 140 } timestamp: 11589864471731939930 op_type: NO_OP 
noop_request { }{code}
Note the drastic jump in timestamp.

In this specific case, we verified that the replayed WAL wasn't that far behind 
the recovery WAL, which had the future timestamps, so we could just delete the 
recovery WAL and bootstrap from the replayed WAL.

It would have been nice had those bad ops not been written at all, maybe by 
preventing an election between such mismatched servers in the first place.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to