Repairs do not have an ability to instantly build a perfect view of its data between your 3 nodes at an exact time. When a piece of data is written there is a delay between when they applied between the nodes, even if its just 500ms. So if a request to read the data and build the merkle tree of the data occurs and it finishes on node1 at 12:01 while node2 finishes at 12:02 the 1 minute or so delta (even if a few seconds, or if using snapshot repairs) between the partition/range hashes in the merkle tree can be different. On a moving data set its almost impossible to have the clusters perfectly in sync for a repair. I wouldnt worry about that log message. If you are worried about consistency between your read/writes use each or local quorum for both.
Chris On Thu, Mar 30, 2017 at 1:22 AM, Roland Otta <roland.o...@willhaben.at> wrote: > hi, > > we see the following behaviour in our environment: > > cluster consists of 6 nodes (cassandra version 3.0.7). keyspace has a > replication factor 3. > clients are writing data to the keyspace with consistency one. > > we are doing parallel, incremental repairs with cassandra reaper. > > even if a repair just finished and we are starting a new one > immediately, we can see the following entries in our logs: > > INFO [RepairJobTask:1] 2017-03-30 10:14:00,782 SyncTask.java:73 - > [repair #d0f651f6-1520-11e7-a443-d9f5b942818e] Endpoints /192.168.0.188 > and /192.168.0.191 have 1 range(s) out of sync for ad_event_history > INFO [RepairJobTask:2] 2017-03-30 10:14:00,782 SyncTask.java:73 - > [repair #d0f651f6-1520-11e7-a443-d9f5b942818e] Endpoints /192.168.0.188 > and /192.168.0.189 have 1 range(s) out of sync for ad_event_history > INFO [RepairJobTask:4] 2017-03-30 10:14:00,782 SyncTask.java:73 - > [repair #d0f651f6-1520-11e7-a443-d9f5b942818e] Endpoints /192.168.0.189 > and /192.168.0.191 have 1 range(s) out of sync for ad_event_history > INFO [RepairJobTask:2] 2017-03-30 10:14:03,997 SyncTask.java:73 - > [repair #d0fa70a1-1520-11e7-a443-d9f5b942818e] Endpoints /192.168.0.26 > and /192.168.0.189 have 2 range(s) out of sync for ad_event_history > INFO [RepairJobTask:1] 2017-03-30 10:14:03,997 SyncTask.java:73 - > [repair #d0fa70a1-1520-11e7-a443-d9f5b942818e] Endpoints /192.168.0.26 > and /192.168.0.191 have 2 range(s) out of sync for ad_event_history > INFO [RepairJobTask:4] 2017-03-30 10:14:03,997 SyncTask.java:73 - > [repair #d0fa70a1-1520-11e7-a443-d9f5b942818e] Endpoints /192.168.0.189 > and /192.168.0.191 have 2 range(s) out of sync for ad_event_history > INFO [RepairJobTask:1] 2017-03-30 10:14:05,375 SyncTask.java:73 - > [repair #d0fbd033-1520-11e7-a443-d9f5b942818e] Endpoints /192.168.0.189 > and /192.168.0.191 have 1 range(s) out of sync for ad_event_history > INFO [RepairJobTask:2] 2017-03-30 10:14:05,375 SyncTask.java:73 - > [repair #d0fbd033-1520-11e7-a443-d9f5b942818e] Endpoints /192.168.0.189 > and /192.168.0.190 have 1 range(s) out of sync for ad_event_history > INFO [RepairJobTask:4] 2017-03-30 10:14:05,375 SyncTask.java:73 - > [repair #d0fbd033-1520-11e7-a443-d9f5b942818e] Endpoints /192.168.0.190 > and /192.168.0.191 have 1 range(s) out of sync for ad_event_history > > we cant see any hints on the systems ... so we thought everything is > running smoothly with the writes. > > do we have to be concerned about the nodes always being out of sync or > is this a normal behaviour in a write intensive table (as the tables > will never be 100% in sync for the latest inserts)? > > bg, > roland > > >