> As you see, this node thinks lots of ranges are out of sync which shouldn't > be the case as successful repairs where done every night prior to the > upgrade. Could this be explained by writes occurring during the upgrade process ?
> I found this bug which touches timestamp and tomstones which was fixed in > 1.1.10 but am not 100% sure if it could be related to this issue: > https://issues.apache.org/jira/browse/CASSANDRA-5153 Me neither, but the issue was fixed in 1.1.0 > It appears that the repair task that I executed after upgrade, brought back > lots of deleted rows into life. Was it entire rows or columns in a row? Do you know if row level or column level deletes were used ? Can you look at the data in cassanca-cli and confirm the timestamps on the columns make sense ? Cheers ----------------- Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 16/03/2013, at 2:31 PM, Arya Goudarzi <gouda...@gmail.com> wrote: > Hi, > > I have upgraded our test cluster from 1.1.6 to 1.1.10. Followed by running > repairs. It appears that the repair task that I executed after upgrade, > brought back lots of deleted rows into life. Here are some logistics: > > - The upgraded cluster started from 1.1.1 -> 1.1.2 -> 1.1.5 -> 1.1.6 > - Old cluster: 4 node, C* 1.1.6 with RF3 using NetworkTopology; > - Upgrade to : 1.1.10 with all other settings the same; > - Successful repairs were being done on this cluster every night; > - Our clients use nanosecond precision timestamp for cassandra calls; > - After upgrade, while running repair I say some log messages like this in > one node: > > system.log.5: INFO [AntiEntropyStage:1] 2013-03-15 19:55:54,847 > AntiEntropyService.java (line 1022) [repair > #0990f320-8da9-11e2-0000-e9b2bd8ea1bd] Endpoints /XX.194.60 and /23.20.207.56 > have 2223 range(s) out of sync for App > system.log.5: INFO [AntiEntropyStage:1] 2013-03-15 19:55:54,877 > AntiEntropyService.java (line 1022) [repair > #0990f320-8da9-11e2-0000-e9b2bd8ea1bd] Endpoints /XX.250.43 and /23.20.207.56 > have 161 range(s) out of sync for App > system.log.5: INFO [AntiEntropyStage:1] 2013-03-15 19:55:55,097 > AntiEntropyService.java (line 1022) [repair > #0990f320-8da9-11e2-0000-e9b2bd8ea1bd] Endpoints /XX.194.60 and /23.20.250.43 > have 2294 range(s) out of sync for App > system.log.5: INFO [AntiEntropyStage:1] 2013-03-15 19:55:59,190 > AntiEntropyService.java (line 789) [repair > #0990f320-8da9-11e2-0000-e9b2bd8ea1bd] App is fully synced (13 remaining > column family to sync for this session) > > As you see, this node thinks lots of ranges are out of sync which shouldn't > be the case as successful repairs where done every night prior to the > upgrade. > > The App CF uses SizeTiered with gc_grace of 10 days. It has caching = 'ALL', > and it is fairly small (11Mb on each node). > > I found this bug which touches timestamp and tomstones which was fixed in > 1.1.10 but am not 100% sure if it could be related to this issue: > https://issues.apache.org/jira/browse/CASSANDRA-5153 > > Any advice on how to dig deeper into this would be appreciated. > > Thanks, > -Arya > >