Hi Aaron: Thanks for your attention.
The cluster in question is a 4 node sandbox cluster we have that does not have much traffic. I was able to chase down this issue on a CF that doesn't change much. That bug was flagged as fixed on 1.1.10. They were row level deletes. We use the nanosecond precision so they are something like this : 1363379219546536704. Although this is recent for the last friday. I have to find one with timestamp which is old and came back to life. Doing this investigation this week and once I collect more info and reproduce from snapshot, I'll let you know. Cheers, -Arya On Mon, Mar 18, 2013 at 10:45 AM, aaron morton <aa...@thelastpickle.com>wrote: > As you see, this node thinks lots of ranges are out of sync which > shouldn't be the case as successful repairs where done every night prior to > the upgrade. > > Could this be explained by writes occurring during the upgrade process ? > > I found this bug which touches timestamp and tomstones which was fixed in > 1.1.10 but am not 100% sure if it could be related to this issue: > https://issues.apache.org/jira/browse/CASSANDRA-5153 > > Me neither, but the issue was fixed in 1.1.0 > > It appears that the repair task that I executed after upgrade, brought > back lots of deleted rows into life. > > Was it entire rows or columns in a row? > Do you know if row level or column level deletes were used ? > > Can you look at the data in cassanca-cli and confirm the timestamps on the > columns make sense ? > > Cheers > > ----------------- > Aaron Morton > Freelance Cassandra Consultant > New Zealand > > @aaronmorton > http://www.thelastpickle.com > > On 16/03/2013, at 2:31 PM, Arya Goudarzi <gouda...@gmail.com> wrote: > > Hi, > > I have upgraded our test cluster from 1.1.6 to 1.1.10. Followed by running > repairs. It appears that the repair task that I executed after upgrade, > brought back lots of deleted rows into life. Here are some logistics: > > - The upgraded cluster started from 1.1.1 -> 1.1.2 -> 1.1.5 -> 1.1.6 > - Old cluster: 4 node, C* 1.1.6 with RF3 using NetworkTopology; > - Upgrade to : 1.1.10 with all other settings the same; > - Successful repairs were being done on this cluster every night; > - Our clients use nanosecond precision timestamp for cassandra calls; > - After upgrade, while running repair I say some log messages like this in > one node: > > system.log.5: INFO [AntiEntropyStage:1] 2013-03-15 19:55:54,847 > AntiEntropyService.java (line 1022) [repair > #0990f320-8da9-11e2-0000-e9b2bd8ea1bd] Endpoints /XX.194.60 and / > 23.20.207.56 have 2223 range(s) out of sync for App > system.log.5: INFO [AntiEntropyStage:1] 2013-03-15 19:55:54,877 > AntiEntropyService.java (line 1022) [repair > #0990f320-8da9-11e2-0000-e9b2bd8ea1bd] Endpoints /XX.250.43 and / > 23.20.207.56 have 161 range(s) out of sync for App > system.log.5: INFO [AntiEntropyStage:1] 2013-03-15 19:55:55,097 > AntiEntropyService.java (line 1022) [repair > #0990f320-8da9-11e2-0000-e9b2bd8ea1bd] Endpoints /XX.194.60 and / > 23.20.250.43 have 2294 range(s) out of sync for App > system.log.5: INFO [AntiEntropyStage:1] 2013-03-15 19:55:59,190 > AntiEntropyService.java (line 789) [repair > #0990f320-8da9-11e2-0000-e9b2bd8ea1bd] App is fully synced (13 remaining > column family to sync for this session) > > As you see, this node thinks lots of ranges are out of sync which > shouldn't be the case as successful repairs where done every night prior to > the upgrade. > > The App CF uses SizeTiered with gc_grace of 10 days. It has caching = > 'ALL', and it is fairly small (11Mb on each node). > > I found this bug which touches timestamp and tomstones which was fixed in > 1.1.10 but am not 100% sure if it could be related to this issue: > https://issues.apache.org/jira/browse/CASSANDRA-5153 > > Any advice on how to dig deeper into this would be appreciated. > > Thanks, > -Arya > > > >