Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/5055 to look at the new patch set (#5). Change subject: KUDU-768 (part 1) - Unify leader/follower mvcc behavior ...................................................................... KUDU-768 (part 1) - Unify leader/follower mvcc behavior Safe time is a timestamp such that all transactions before it are known and either completed or in-flight. Waiting for the Mvcc snapshot at "safe time" to be "clean" allows to yield repeatable reads: scans of a tablet at a snapshot defined by a timestamp that will always return the same results. Proper "safe time" advancement also allows to improve load balancing: A scan at a clean timestamp that is lower that "safe time" on a replica is guaranteed to yield the same results as the same scan on the leader replica (though maybe with a lantency penalty). Currently this timestamp is advanced within Mvcc but this is not natural as in conflates the consensus state (all the operations that are being replicated and/or replayed) and the mvcc state (all the operations that have been consensus committed and are being applied). Furthermore, there is a confusing mixing of concepts in Mvcc between "safe time" and "clean time" where the latter means a timestamp such that all operation have been completed, whereas the former also includes the operations that are in-flight, even if they haven't started being applied to the tablet. This patch series aims at separating the two concepts and fixing safe time advancement: a) - Safe time advancement will be handled by consensus: The leader can easily establish which timestamps are safe for a replica by looking at which operations that replica knows and what the timestamp of the last committed operation is. b) - Mvcc will only take care of monitoring "clean time" advancement. This makes it simpler to wait for a timestamp to be "safe" and "clean" the caller will first wait for a timestamp to be "safe" meaning all operations are known and in-flight and then wait for it to be "clean" in mvcc meaning all the in-flight operations before have completed. This patch in particular takes the first two steps in this direction: 1) It moves timestamp assignment from tablet and into the TransactionDriver to be done prior to pushing the operation to consensus for replication. Follow up patches will move it to be done within consensus itself (though not necessarily managed by any of the consensus classes). 2) It makes all operations be "operations at a timestamp", making all operations have the same behavior within mvcc independently of whether they were started at the leader or at a follower. Follow up patches will completely remove the Mvcc APIs for automatic safe time advancement and timestamp assignment and will introduce the new entity responsible for "safe time". Change-Id: I3ba7212f9211f585d4bef00e5ccfc24d5eece224 --- M src/kudu/tablet/local_tablet_writer.h M src/kudu/tablet/tablet.cc M src/kudu/tablet/tablet.h M src/kudu/tablet/tablet_peer.cc M src/kudu/tablet/transactions/transaction_driver.cc M src/kudu/tablet/transactions/transaction_driver.h M src/kudu/tablet/transactions/transaction_tracker-test.cc 7 files changed, 51 insertions(+), 18 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/55/5055/5 -- To view, visit http://gerrit.cloudera.org:8080/5055 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3ba7212f9211f585d4bef00e5ccfc24d5eece224 Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro Alves <dral...@apache.org> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot