Re: Lots of Deleted Rows Came back after upgrade 1.1.6 to 1.1.10

aaron morton Sun, 24 Mar 2013 10:04:20 -0700

> I could imagine a  scenario where a hint was replayed to a replica after all 
> replicas had purged their tombstones
Scratch that, the hints are TTL'd with the lowest gc_grace. 
Ticket closed https://issues.apache.org/jira/browse/CASSANDRA-5379


Cheers

-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 24/03/2013, at 6:24 AM, aaron morton <aa...@thelastpickle.com> wrote:

>> Beside the joke, would hinted handoff really have any role in this issue?
> I could imagine a  scenario where a hint was replayed to a replica after all 
> replicas had purged their tombstones. That seems like a long shot, it would 
> need one node to be down for the write and all up for the delete and for all 
> of them to have purged the tombstone. But maybe we should have a max age on 
> hints so it cannot happen. 
> 
> Created https://issues.apache.org/jira/browse/CASSANDRA-5379
> 
> Ensuring no hints are in place during an upgrade would work around. I tend to 
> make sure hints and commit log are clear during an upgrade. 
> 
> Cheers
> 
> -----------------
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 22/03/2013, at 7:54 AM, Arya Goudarzi <gouda...@gmail.com> wrote:
> 
>> Beside the joke, would hinted handoff really have any role in this issue? I 
>> have been struggling to reproduce this issue using the snapshot data taken 
>> from our cluster and following the same upgrade process from 1.1.6 to 
>> 1.1.10. I know snapshots only link to active SSTables. What if these 
>> returned rows belong to some inactive SSTables and some bug exposed itself 
>> and marked them as active? What are the possibilities that could lead to 
>> this? I am eager to find our as this is blocking our upgrade.
>> 
>> On Tue, Mar 19, 2013 at 2:11 AM, <moshe.kr...@barclays.com> wrote:
>> This obscure feature of Cassandra is called “haunted handoff”.
>> 
>>  
>> 
>> Happy (early) April Fools J
>> 
>>  
>> 
>> From: aaron morton [mailto:aa...@thelastpickle.com] 
>> Sent: Monday, March 18, 2013 7:45 PM
>> To: user@cassandra.apache.org
>> Subject: Re: Lots of Deleted Rows Came back after upgrade 1.1.6 to 1.1.10
>> 
>>  
>> 
>> As you see, this node thinks lots of ranges are out of sync which shouldn't 
>> be the case as successful repairs where done every night prior to the 
>> upgrade. 
>> 
>> Could this be explained by writes occurring during the upgrade process ? 
>> 
>>  
>> 
>> I found this bug which touches timestamp and tomstones which was fixed in 
>> 1.1.10 but am not 100% sure if it could be related to this issue: 
>> https://issues.apache.org/jira/browse/CASSANDRA-5153
>> 
>> Me neither, but the issue was fixed in 1.1.0
>> 
>>  
>> 
>>  It appears that the repair task that I executed after upgrade, brought back 
>> lots of deleted rows into life.
>> 
>> Was it entire rows or columns in a row?
>> 
>> Do you know if row level or column level deletes were used ? 
>> 
>>  
>> 
>> Can you look at the data in cassanca-cli and confirm the timestamps on the 
>> columns make sense ?  
>> 
>>  
>> 
>> Cheers
>> 
>>  
>> 
>> -----------------
>> 
>> Aaron Morton
>> 
>> Freelance Cassandra Consultant
>> 
>> New Zealand
>> 
>>  
>> 
>> @aaronmorton
>> 
>> http://www.thelastpickle.com
>> 
>>  
>> 
>> On 16/03/2013, at 2:31 PM, Arya Goudarzi <gouda...@gmail.com> wrote:
>> 
>> 
>> 
>> 
>> Hi,
>> 
>>  
>> 
>> I have upgraded our test cluster from 1.1.6 to 1.1.10. Followed by running 
>> repairs. It appears that the repair task that I executed after upgrade, 
>> brought back lots of deleted rows into life. Here are some logistics:
>> 
>>  
>> 
>> - The upgraded cluster started from 1.1.1 -> 1.1.2 -> 1.1.5 -> 1.1.6 
>> 
>> - Old cluster: 4 node, C* 1.1.6 with RF3 using NetworkTopology;
>> 
>> - Upgrade to : 1.1.10 with all other settings the same;
>> 
>> - Successful repairs were being done on this cluster every night;
>> 
>> - Our clients use nanosecond precision timestamp for cassandra calls;
>> 
>> - After upgrade, while running repair I say some log messages like this in 
>> one node:
>> 
>>  
>> 
>> system.log.5: INFO [AntiEntropyStage:1] 2013-03-15 19:55:54,847 
>> AntiEntropyService.java (line 1022) [repair 
>> #0990f320-8da9-11e2-0000-e9b2bd8ea1bd] Endpoints /XX.194.60 and 
>> /23.20.207.56 have 2223 range(s) out of sync for App
>> 
>> system.log.5: INFO [AntiEntropyStage:1] 2013-03-15 19:55:54,877 
>> AntiEntropyService.java (line 1022) [repair 
>> #0990f320-8da9-11e2-0000-e9b2bd8ea1bd] Endpoints /XX.250.43 and 
>> /23.20.207.56 have 161 range(s) out of sync for App
>> 
>> system.log.5: INFO [AntiEntropyStage:1] 2013-03-15 19:55:55,097 
>> AntiEntropyService.java (line 1022) [repair 
>> #0990f320-8da9-11e2-0000-e9b2bd8ea1bd] Endpoints /XX.194.60 and 
>> /23.20.250.43 have 2294 range(s) out of sync for App
>> 
>> system.log.5: INFO [AntiEntropyStage:1] 2013-03-15 19:55:59,190 
>> AntiEntropyService.java (line 789) [repair 
>> #0990f320-8da9-11e2-0000-e9b2bd8ea1bd] App is fully synced (13 remaining 
>> column family to sync for this session)
>> 
>>  
>> 
>> As you see, this node thinks lots of ranges are out of sync which shouldn't 
>> be the case as successful repairs where done every night prior to the 
>> upgrade. 
>> 
>>  
>> 
>> The App CF uses SizeTiered with gc_grace of 10 days. It has caching = 'ALL', 
>> and it is fairly small (11Mb on each node).
>> 
>>  
>> 
>> I found this bug which touches timestamp and tomstones which was fixed in 
>> 1.1.10 but am not 100% sure if it could be related to this issue: 
>> https://issues.apache.org/jira/browse/CASSANDRA-5153
>> 
>>  
>> 
>> Any advice on how to dig deeper into this would be appreciated.
>> 
>>  
>> 
>> Thanks,
>> 
>> -Arya
>> 
>>  
>> 
>>  
>> 
>>  
>> 
>> _______________________________________________
>> 
>> This message may contain information that is confidential or privileged. If 
>> you are not an intended recipient of this message, please delete it and any 
>> attachments, and notify the sender that you have received it in error. 
>> Unless specifically stated in the message or otherwise indicated, you may 
>> not duplicate, redistribute or forward this message or any portion thereof, 
>> including any attachments, by any means to any other person, including any 
>> retail investor or customer. This message is not a recommendation, advice, 
>> offer or solicitation, to buy/sell any product or service, and is not an 
>> official confirmation of any transaction. Any opinions presented are solely 
>> those of the author and do not necessarily represent those of Barclays. This 
>> message is subject to terms available at: www.barclays.com/emaildisclaimer 
>> and, if received from Barclays' Sales or Trading desk, the terms available 
>> at: www.barclays.com/salesandtradingdisclaimer/. By messaging with Barclays 
>> you consent to the foregoing. Barclays Bank PLC is a company registered in 
>> England (number 1026167) with its registered office at 1 Churchill Place, 
>> London, E14 5HP. This email may relate to or be sent from other members of 
>> the Barclays group.
>> 
>> _______________________________________________
>> 
>> 
>

Re: Lots of Deleted Rows Came back after upgrade 1.1.6 to 1.1.10

Reply via email to