Re: Questions about C* performance related to tombstone

Alok Dwivedi Wed, 10 Apr 2019 02:14:18 -0700

Your delete query 
>> "DELETE FROM myTable WHERE course_id = 'C' AND assignment_id = 'A1';”.
will generate multi row range tombstones. Since you are reading entire 
partition which effectively will be read in pages (slice query equivalent) you 
may get tombstones in certain pages depending upon how much deletes you are 
doing. However looking at your use case I don’t think you will end with very 
high ratio of deleted to live data so normal deletes should be fine as is 
already pointed out below. Note that range tombstones are more effective 
storage space wise as they have start/end range rather than deleted info for 
every deleted row. So I also don’t think  your workaround of using ‘active’ 
flag is really needed unless its for auditing. Another thing to note is if you 
have a use case where you want to be more aggressive in evicting tombstones 
then here are some settings worth exploring
- tombstone_threshold
- unchecked_tombstone_compaction
-tombstone_compaction_interval
Additionally gc_grace_seconds can be looked at but it must be handled very 
carefully as we must ensure that repair completes in an interval less than this 
setting to prevent any deleted data reappearing.


Regards
Alok


> On 9 Apr 2019, at 15:56, Jon Haddad <j...@jonhaddad.com> wrote:
> 
> Normal deletes are fine.
> 
> Sadly there's a lot of hand wringing about tombstones in the generic
> sense which leads people to try to work around *every* case where
> they're used.  This is unnecessary.  A tombstone over a single row
> isn't a problem, especially if you're only fetching that one row back.
> Tombstones can be quite terrible under a few conditions:
> 
> 1. When a range tombstone shadows hundreds / thousands / millions of
> rows.  This wasn't even detectable prior to Cassandra 3 unless you
> were either looking for it specifically or were doing CPU profiling:
> http://thelastpickle.com/blog/2018/07/05/undetectable-tombstones-in-apache-cassandra.html
> 2. When rows were frequently created then deleted, and scanned over.
> This is the queue pattern that we detest so much.
> 3. When they'd be created as a side effect from over writing
> collections.  This is an accident typically.
> 
> The 'active' flag is good if you want to be able to go back and look
> at old deleted assignments.  If you don't care about that, use a
> normal delete.
> 
> Jon
> 
> On Tue, Apr 9, 2019 at 7:00 AM Li, George <guangxing...@pearson.com> wrote:
>> 
>> Hi,
>> 
>> I have a table defined like this:
>> 
>> CREATE TABLE myTable (
>> course_id text,
>> assignment_id text,
>> assignment_item_id text,
>> data text,
>> boolean active,
>> PRIMARY KEY (course_id, assignment_id, assignment_item_id)
>> );
>> i.e. course_id as the partition key and assignment_id, assignment_item_id as 
>> clustering keys.
>> 
>> After data is populated, some delete queries by course_id and assignment_id 
>> occurs, e.g. "DELETE FROM myTable WHERE course_id = 'C' AND assignment_id = 
>> 'A1';". This would create tombstones so query "SELECT * FROM myTable WHERE 
>> course_id = 'C';" would be affected, right? Would query "SELECT * FROM 
>> myTable WHERE course_id = 'C' AND assignment_id = 'A2';" be affected too?
>> 
>> For query "SELECT * FROM myTable WHERE course_id = 'C';", to workaround the 
>> tombstone problem, we are thinking about not doing hard deletes, instead 
>> doing soft deletes. So instead of doing "DELETE FROM myTable WHERE course_id 
>> = 'C' AND assignment_id = 'A1';", we do "UPDATE myTable SET active = false 
>> WHERE course_id = 'C' AND assignment_id = 'A1';". Then in the application, 
>> we do query "SELECT * FROM myTable WHERE course_id = 'C';" and filter out 
>> records that have "active" equal to "false". I am not really sure this would 
>> improve performance because C* still has to scan through all records with 
>> the partition key "C". It is just instead of scanning through X records + Y 
>> tombstone records with hard deletes that generate tombstones, it now scans 
>> through X + Y records with soft deletes and no tombstones. Am I right?
>> 
>> Thanks.
>> 
>> George
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>

Re: Questions about C* performance related to tombstone

Reply via email to