Re: Range deletes, wide partitions, and reverse iterators

Nitan Kainth Tue, 16 May 2017 07:30:07 -0700

Hannu,

How can you read a partition in reverse?


Sent from my iPhone

> On May 16, 2017, at 9:20 AM, Hannu Kröger <hkro...@gmail.com> wrote:
> 
> Well, I’m guessing that Cassandra doesn't really know if the range tombstone 
> is useful for this or not. 
> 
> In many cases it might be that the partition contains data that is within the 
> range of the tombstone but is newer than the tombstone and therefore it might 
> be still be returned. Scanning through deleted data can be avoided by reading 
> the partition in reverse (if all the deleted data is in the beginning of the 
> partition). Eventually you will still end up reading a lot of tombstones but 
> you will get a lot of live data first and the implicit query limit of 10000 
> probably is reached before you get to the tombstones. Therefore you will get 
> an immediate answer.
> 
> Does it make sense?
> 
> Hannu
> 
>> On 16 May 2017, at 16:33, Stefano Ortolani <ostef...@gmail.com> wrote:
>> 
>> Hi all,
>> 
>> I am seeing inconsistencies when mixing range tombstones, wide partitions, 
>> and reverse iterators.
>> I still have to understand if the behaviour is to be expected hence the 
>> message on the mailing list.
>> 
>> The situation is conceptually simple. I am using a table defined as follows:
>> 
>> CREATE TABLE test_cql.test_cf (
>>  hash blob,
>>  timeid timeuuid,
>>  PRIMARY KEY (hash, timeid)
>> ) WITH CLUSTERING ORDER BY (timeid ASC)
>>  AND compaction = {'class' : 'LeveledCompactionStrategy'};
>> 
>> I then proceed by loading 2/3GB from 3 sstables which I know contain a 
>> really wide partition (> 512 MB) for `hash = x`. I then delete the oldest 
>> _half_ of that partition by executing the query below, and restart the node:
>> 
>> DELETE 
>> FROM test_cql.test_cf 
>> WHERE hash = x AND timeid < y;
>> 
>> If I keep compactions disabled the following query timeouts (takes more than 
>> 10 seconds to 
>> succeed):
>> 
>> SELECT * 
>> FROM test_cql.test_cf 
>> WHERE hash = 0x963204d451de3e611daf5e340c3594acead0eaaf 
>> ORDER BY timeid ASC;
>> 
>> While the following returns immediately (obviously because no deleted data 
>> is ever read):
>> 
>> SELECT * 
>> FROM test_cql.test_cf 
>> WHERE hash = 0x963204d451de3e611daf5e340c3594acead0eaaf 
>> ORDER BY timeid DESC;
>> 
>> If I force a compaction the problem is gone, but I presume just because the 
>> data is rearranged.
>> 
>> It seems to me that reading by ASC does not make use of the range tombstone 
>> until C* reads the
>> last sstables (which actually contains the range tombstone and is flushed at 
>> node restart), and it wastes time reading all rows that are actually not 
>> live anymore. 
>> 
>> Is this expected? Should the range tombstone actually help in these cases?
>> 
>> Thanks a lot!
>> Stefano
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: Range deletes, wide partitions, and reverse iterators

Reply via email to