Hi Aaron:

Thanks for your attention.

The cluster in question is a 4 node sandbox cluster we have that does not
have much traffic. I was able to chase down this issue on a CF that doesn't
change much.

That bug was flagged as fixed on 1.1.10.

They were row level deletes.

We use the nanosecond precision so they are something like this
: 1363379219546536704. Although this is recent for the last friday. I have
to find one with timestamp which is old and came back to life. Doing this
investigation this week and once I collect more info and reproduce from
snapshot, I'll let you know.

Cheers,
-Arya



On Mon, Mar 18, 2013 at 10:45 AM, aaron morton <aa...@thelastpickle.com>wrote:

> As you see, this node thinks lots of ranges are out of sync which
> shouldn't be the case as successful repairs where done every night prior to
> the upgrade.
>
> Could this be explained by writes occurring during the upgrade process ?
>
> I found this bug which touches timestamp and tomstones which was fixed in
> 1.1.10 but am not 100% sure if it could be related to this issue:
> https://issues.apache.org/jira/browse/CASSANDRA-5153
>
> Me neither, but the issue was fixed in 1.1.0
>
>  It appears that the repair task that I executed after upgrade, brought
> back lots of deleted rows into life.
>
> Was it entire rows or columns in a row?
> Do you know if row level or column level deletes were used ?
>
> Can you look at the data in cassanca-cli and confirm the timestamps on the
> columns make sense ?
>
> Cheers
>
> -----------------
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 16/03/2013, at 2:31 PM, Arya Goudarzi <gouda...@gmail.com> wrote:
>
> Hi,
>
> I have upgraded our test cluster from 1.1.6 to 1.1.10. Followed by running
> repairs. It appears that the repair task that I executed after upgrade,
> brought back lots of deleted rows into life. Here are some logistics:
>
> - The upgraded cluster started from 1.1.1 -> 1.1.2 -> 1.1.5 -> 1.1.6
> - Old cluster: 4 node, C* 1.1.6 with RF3 using NetworkTopology;
> - Upgrade to : 1.1.10 with all other settings the same;
> - Successful repairs were being done on this cluster every night;
> - Our clients use nanosecond precision timestamp for cassandra calls;
> - After upgrade, while running repair I say some log messages like this in
> one node:
>
> system.log.5: INFO [AntiEntropyStage:1] 2013-03-15 19:55:54,847
> AntiEntropyService.java (line 1022) [repair
> #0990f320-8da9-11e2-0000-e9b2bd8ea1bd] Endpoints /XX.194.60 and /
> 23.20.207.56 have 2223 range(s) out of sync for App
> system.log.5: INFO [AntiEntropyStage:1] 2013-03-15 19:55:54,877
> AntiEntropyService.java (line 1022) [repair
> #0990f320-8da9-11e2-0000-e9b2bd8ea1bd] Endpoints /XX.250.43 and /
> 23.20.207.56 have 161 range(s) out of sync for App
> system.log.5: INFO [AntiEntropyStage:1] 2013-03-15 19:55:55,097
> AntiEntropyService.java (line 1022) [repair
> #0990f320-8da9-11e2-0000-e9b2bd8ea1bd] Endpoints /XX.194.60 and /
> 23.20.250.43 have 2294 range(s) out of sync for App
> system.log.5: INFO [AntiEntropyStage:1] 2013-03-15 19:55:59,190
> AntiEntropyService.java (line 789) [repair
> #0990f320-8da9-11e2-0000-e9b2bd8ea1bd] App is fully synced (13 remaining
> column family to sync for this session)
>
> As you see, this node thinks lots of ranges are out of sync which
> shouldn't be the case as successful repairs where done every night prior to
> the upgrade.
>
> The App CF uses SizeTiered with gc_grace of 10 days. It has caching =
> 'ALL', and it is fairly small (11Mb on each node).
>
> I found this bug which touches timestamp and tomstones which was fixed in
> 1.1.10 but am not 100% sure if it could be related to this issue:
> https://issues.apache.org/jira/browse/CASSANDRA-5153
>
> Any advice on how to dig deeper into this would be appreciated.
>
> Thanks,
> -Arya
>
>
>
>

Reply via email to