But we're talking about a single tombstone on each of a finite (small) set
of values, right?  We're not talking about INSERTs which are 99% nulls (at
least I don't think that's what Matthew was suggesting).  Unless you're
engaging in the antipattern of repeated overwrite, I'm still struggling to
see why this is worse than an equivalent number of non-tombstoned writes.
In fact from the description I don't think we're talking about these
tombstones even occluding any value at all.

> imagine a multi tb sstable w/ 99% tombstones

Let's play with this hypothetical, which doesn't seem like a probable
consequence of the original question.  You'd have to have taken enough
writes *inside* gc grace period to have even produced a multi-TB sstable to
come anywhere near this, and even then this either exceeds or comes really
close to the recommended maximum total data size per node (let alone in a
single sstable).  If you did have such an sstable, it doesn't seem very
likely to compact again inside gc grace period short of manually triggered
major compaction.

But let's assume you do that, you run cassandra stress inserting nothing
but tombstones, and kick off major compaction periodically.  If it
compacted inside gc grace period, is this worse for compaction than the
same number of non-tombstoned values (i.e. a multi-TB sstable is costly to
compact no matter what the contents)?  If it compacted outside gc grace
period, then 99% of the work is just dropping tombstones, it seems like it
would run really fast (for being an absurdly large sstable), as there would
be just 1% of the contents to actually copy over to the new sstable.

I'm still not clear on what I'm missing.  Is a tombstone more expensive to
compact than a non-tombstone?

On Wed, Apr 29, 2015 at 10:06 AM, Jonathan Haddad <j...@jonhaddad.com> wrote:

> Enough tombstones can inflate the size of an SSTable causing issues during
> compaction (imagine a multi tb sstable w/ 99% tombstones) even if there's
> no clustering key defined.
>
> Perhaps an edge case, but worth considering.
>
> On Wed, Apr 29, 2015 at 9:17 AM Eric Stevens <migh...@gmail.com> wrote:
>
>> Correct me if I'm wrong, but tombstones are only really problematic if
>> you have them going into clustering keys, then perform a range select on
>> that column, right (assuming it's not a symptom of the antipattern of
>> indefinitely overwriting the same value)?  I.E. you're deleting clusters
>> off of a partition.  A tombstone isn't any more costly, and in some ways
>> less costly than a normal column (it's a smaller size at rest than, say,
>> inserting an empty string or other default value as someone suggested).
>>
>> Tombstones stay around a little longer post-compaction than other values,
>> so that's a downside, but they also would drop off the record as if it had
>> never been set on the next compaction after gc grace period.
>>
>> Tombstones aren't intrinsically bad, but they can have some bad
>> properties in certain situations.  This doesn't strike me as one of them.
>> If you have a way to avoid inserting null when you know you aren't
>> occluding an underlying value, that would be ideal.  But because the
>> tombstone would sit adjacent on disk to other values from the same insert,
>> even if you were on platters, the drive head is *already positioned* over
>> the tombstone location when it's read, because it read the prior value and
>> subsequent value which were written during the same insert.
>>
>> In the end, inserting a tombstone into a non-clustered column shouldn't
>> be appreciably worse (if it is at all) than inserting a value instead.  Or
>> am I missing something here?
>>
>> On Wed, Apr 29, 2015 at 7:53 AM, Matthew Johnson <matt.john...@algomi.com
>> > wrote:
>>
>>> Thank you all for the advice!
>>>
>>>
>>>
>>> I have decided to use the Insert query builder (
>>> *com.datastax.driver.core.querybuilder.Insert*) which allows me to
>>> dynamically insert as many or as few columns as I need, and doesn’t require
>>> multiple prepared statements. Then, I will look at Ali’s suggestion – I
>>> will create a small helper method like ‘addToInsertIfNotNull’ and pump all
>>> my values into that, which will then filter out the ones that are null.
>>> Should keep the code nice and neat – I will feed back if I find any
>>> problems with this approach (but please jump in if you have already spotted
>>> any :)).
>>>
>>>
>>>
>>> Thanks!
>>>
>>> Matt
>>>
>>>
>>>
>>> *From:* Robert Wille [mailto:rwi...@fold3.com]
>>> *Sent:* 29 April 2015 15:16
>>> *To:* user@cassandra.apache.org
>>> *Subject:* Re: Inserting null values
>>>
>>>
>>>
>>> I’ve come across the same thing. I have a table with at least half a
>>> dozen columns that could be null, in any combination. Having a prepared
>>> statement for each permutation of null columns just isn’t going to happen.
>>> I don’t want to build custom queries each time because I have a really cool
>>> system of managing my queries that relies on them being prepared.
>>>
>>>
>>>
>>> Fortunately for me, I should have at most a handful of tombstones in
>>> each partition, and most of my records are written exactly once. So, I just
>>> let the tombstones get written and they’ll eventually get compacted out and
>>> life will go on.
>>>
>>>
>>>
>>> It’s annoying and not ideal, but what can you do?
>>>
>>>
>>>
>>> On Apr 29, 2015, at 2:36 AM, Matthew Johnson <matt.john...@algomi.com>
>>> wrote:
>>>
>>>
>>>
>>> Hi all,
>>>
>>>
>>>
>>> I have some fields that I am storing into Cassandra, but some of them
>>> could be null at any given point. As there are quite a lot of them, it
>>> makes the code much more readable if I don’t check each one for null before
>>> adding it to the INSERT.
>>>
>>>
>>>
>>> I can see a few Jiras around CQL 3 supporting inserting nulls:
>>>
>>>
>>>
>>> https://issues.apache.org/jira/browse/CASSANDRA-3783
>>>
>>> https://issues.apache.org/jira/browse/CASSANDRA-5648
>>>
>>>
>>>
>>> But I have tested inserting null and it seems to work fine (when
>>> querying the table with cqlsh, it shows up as a red lowercase *null*).
>>>
>>>
>>>
>>> Are there any obvious pitfalls to look out for that I have missed? Could
>>> it be a performance concern to insert a row with some nulls, as opposed to
>>> checking the values first and inserting the row and just omitting those
>>> columns?
>>>
>>>
>>>
>>> Thanks!
>>>
>>> Matt
>>>
>>>
>>>
>>
>>

Reply via email to