Re: Range deletes, wide partitions, and reverse iterators

Stefano Ortolani Tue, 16 May 2017 07:33:49 -0700

That is another way to see the question: are reverse iterators range
tombstone aware? Yes.
That is why I am puzzled by this afore-mentioned behavior.
I would expect them to handle this case more gracefully.


Cheers,
Stefano

On Tue, May 16, 2017 at 3:29 PM, Nitan Kainth <ni...@bamlabs.com> wrote:

> Hannu,
>
> How can you read a partition in reverse?
>
> Sent from my iPhone
>
> > On May 16, 2017, at 9:20 AM, Hannu Kröger <hkro...@gmail.com> wrote:
> >
> > Well, I’m guessing that Cassandra doesn't really know if the range
> tombstone is useful for this or not.
> >
> > In many cases it might be that the partition contains data that is
> within the range of the tombstone but is newer than the tombstone and
> therefore it might be still be returned. Scanning through deleted data can
> be avoided by reading the partition in reverse (if all the deleted data is
> in the beginning of the partition). Eventually you will still end up
> reading a lot of tombstones but you will get a lot of live data first and
> the implicit query limit of 10000 probably is reached before you get to the
> tombstones. Therefore you will get an immediate answer.
> >
> > Does it make sense?
> >
> > Hannu
> >
> >> On 16 May 2017, at 16:33, Stefano Ortolani <ostef...@gmail.com> wrote:
> >>
> >> Hi all,
> >>
> >> I am seeing inconsistencies when mixing range tombstones, wide
> partitions, and reverse iterators.
> >> I still have to understand if the behaviour is to be expected hence the
> message on the mailing list.
> >>
> >> The situation is conceptually simple. I am using a table defined as
> follows:
> >>
> >> CREATE TABLE test_cql.test_cf (
> >>  hash blob,
> >>  timeid timeuuid,
> >>  PRIMARY KEY (hash, timeid)
> >> ) WITH CLUSTERING ORDER BY (timeid ASC)
> >>  AND compaction = {'class' : 'LeveledCompactionStrategy'};
> >>
> >> I then proceed by loading 2/3GB from 3 sstables which I know contain a
> really wide partition (> 512 MB) for `hash = x`. I then delete the oldest
> _half_ of that partition by executing the query below, and restart the node:
> >>
> >> DELETE
> >> FROM test_cql.test_cf
> >> WHERE hash = x AND timeid < y;
> >>
> >> If I keep compactions disabled the following query timeouts (takes more
> than 10 seconds to
> >> succeed):
> >>
> >> SELECT *
> >> FROM test_cql.test_cf
> >> WHERE hash = 0x963204d451de3e611daf5e340c3594acead0eaaf
> >> ORDER BY timeid ASC;
> >>
> >> While the following returns immediately (obviously because no deleted
> data is ever read):
> >>
> >> SELECT *
> >> FROM test_cql.test_cf
> >> WHERE hash = 0x963204d451de3e611daf5e340c3594acead0eaaf
> >> ORDER BY timeid DESC;
> >>
> >> If I force a compaction the problem is gone, but I presume just because
> the data is rearranged.
> >>
> >> It seems to me that reading by ASC does not make use of the range
> tombstone until C* reads the
> >> last sstables (which actually contains the range tombstone and is
> flushed at node restart), and it wastes time reading all rows that are
> actually not live anymore.
> >>
> >> Is this expected? Should the range tombstone actually help in these
> cases?
> >>
> >> Thanks a lot!
> >> Stefano
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: user-h...@cassandra.apache.org
> >
>

Re: Range deletes, wide partitions, and reverse iterators

Reply via email to