Hannu, How can you read a partition in reverse?
Sent from my iPhone > On May 16, 2017, at 9:20 AM, Hannu Kröger <hkro...@gmail.com> wrote: > > Well, I’m guessing that Cassandra doesn't really know if the range tombstone > is useful for this or not. > > In many cases it might be that the partition contains data that is within the > range of the tombstone but is newer than the tombstone and therefore it might > be still be returned. Scanning through deleted data can be avoided by reading > the partition in reverse (if all the deleted data is in the beginning of the > partition). Eventually you will still end up reading a lot of tombstones but > you will get a lot of live data first and the implicit query limit of 10000 > probably is reached before you get to the tombstones. Therefore you will get > an immediate answer. > > Does it make sense? > > Hannu > >> On 16 May 2017, at 16:33, Stefano Ortolani <ostef...@gmail.com> wrote: >> >> Hi all, >> >> I am seeing inconsistencies when mixing range tombstones, wide partitions, >> and reverse iterators. >> I still have to understand if the behaviour is to be expected hence the >> message on the mailing list. >> >> The situation is conceptually simple. I am using a table defined as follows: >> >> CREATE TABLE test_cql.test_cf ( >> hash blob, >> timeid timeuuid, >> PRIMARY KEY (hash, timeid) >> ) WITH CLUSTERING ORDER BY (timeid ASC) >> AND compaction = {'class' : 'LeveledCompactionStrategy'}; >> >> I then proceed by loading 2/3GB from 3 sstables which I know contain a >> really wide partition (> 512 MB) for `hash = x`. I then delete the oldest >> _half_ of that partition by executing the query below, and restart the node: >> >> DELETE >> FROM test_cql.test_cf >> WHERE hash = x AND timeid < y; >> >> If I keep compactions disabled the following query timeouts (takes more than >> 10 seconds to >> succeed): >> >> SELECT * >> FROM test_cql.test_cf >> WHERE hash = 0x963204d451de3e611daf5e340c3594acead0eaaf >> ORDER BY timeid ASC; >> >> While the following returns immediately (obviously because no deleted data >> is ever read): >> >> SELECT * >> FROM test_cql.test_cf >> WHERE hash = 0x963204d451de3e611daf5e340c3594acead0eaaf >> ORDER BY timeid DESC; >> >> If I force a compaction the problem is gone, but I presume just because the >> data is rearranged. >> >> It seems to me that reading by ASC does not make use of the range tombstone >> until C* reads the >> last sstables (which actually contains the range tombstone and is flushed at >> node restart), and it wastes time reading all rows that are actually not >> live anymore. >> >> Is this expected? Should the range tombstone actually help in these cases? >> >> Thanks a lot! >> Stefano > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org > For additional commands, e-mail: user-h...@cassandra.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org