Hello all, I currently working on testing of various HA scenarios on small cassandra cluster of 8 nodes, RF=3. I have a test environment with thrift clients doing double writes of all operations to cassandra cluster and reliable storage and cross checking read results. Reads are performed with CL=ONE due to latency requirements. I tested how fail over and fail back is working.
I found, that on failback, a lot of data mismatches between reliable storage and cassandra was discovered as soon as failed back node started to accept reads. Later, as soon as hinted handoff was completed, no more mismatches was ever reported. So, the idea is, that even with CL=ONE we could have almost no inconsistencies on node failback, if cassandra node started to accept reads after hinted handoff is completed. So, the question is: is it possible for thrift client to know the current status of hinted handoff of just failed back node ? This way clients could wait for HH to complete to not query just failed back node and reroute queries to other endpoints, while failing back node synchronizes itself with cluster.