Hello all,

I currently working on testing of various HA scenarios on small cassandra
cluster of 8 nodes, RF=3. I have a test environment with thrift clients doing
double writes of all operations to cassandra cluster and reliable storage and
cross checking read results. Reads are performed with CL=ONE due to latency
requirements. I tested how fail over and fail back is working. 

I found, that on failback, a lot of data mismatches between reliable storage and
cassandra was discovered as soon as failed back node started to accept reads.
Later, as soon as hinted handoff was completed, no more mismatches was ever
reported.

So, the idea is, that even with CL=ONE we could have almost no inconsistencies
on node failback, if cassandra node started to accept reads after hinted handoff
is completed.

So, the question is: is it possible for thrift client to know the current status
of hinted handoff of just failed back node ? 
This way clients could wait for HH to complete to not query just failed back
node and reroute queries to other endpoints, while failing back node
synchronizes itself with cluster.


Reply via email to