On 7/28/22 08:35, José Armando García Sancio wrote:
B) I started the majority of the nodes (1, 2). The ensemble was
established and I was able to create a znode using the CLI.
C) I shutdown all of the nodes (1, 2 since I never started node 3). To
simulate a disk failure I deleted the content of the transaction and
snapshot directory (version-2) for node 2.
Note that at this point only node 1 knows about the znode you created in
step B.
D) I started the majority of the nodes (2, 3). The ensemble was
established and I was able to establish a connection with the CLI.
E) I finally started node 1 which had the committed transactions and
snapshots. The znode created in step B) was not present.
Node 1 is most likely informed that its database is now out of date (or
it decides that for itself) so it syncs the whole DB from the current
leader, which will not know about the znode created in step B.
Not in any way a ZK expert. But that seems like the most logical way
for it to work.
I'm just guessing that there is some timestamp which declares the last
time a database was running with quorum and that comparing those
timestamps is how ZK decides that a node's database is out of date. I
am curious as to whether I have deduced things incorrectly.
Thanks,
Shawn