Enough tombstones can inflate the size of an SSTable causing issues during
compaction (imagine a multi tb sstable w/ 99% tombstones) even if there's
no clustering key defined.

Perhaps an edge case, but worth considering.

On Wed, Apr 29, 2015 at 9:17 AM Eric Stevens <migh...@gmail.com> wrote:

> Correct me if I'm wrong, but tombstones are only really problematic if you
> have them going into clustering keys, then perform a range select on that
> column, right (assuming it's not a symptom of the antipattern of
> indefinitely overwriting the same value)?  I.E. you're deleting clusters
> off of a partition.  A tombstone isn't any more costly, and in some ways
> less costly than a normal column (it's a smaller size at rest than, say,
> inserting an empty string or other default value as someone suggested).
>
> Tombstones stay around a little longer post-compaction than other values,
> so that's a downside, but they also would drop off the record as if it had
> never been set on the next compaction after gc grace period.
>
> Tombstones aren't intrinsically bad, but they can have some bad properties
> in certain situations.  This doesn't strike me as one of them.  If you have
> a way to avoid inserting null when you know you aren't occluding an
> underlying value, that would be ideal.  But because the tombstone would sit
> adjacent on disk to other values from the same insert, even if you were on
> platters, the drive head is *already positioned* over the tombstone
> location when it's read, because it read the prior value and subsequent
> value which were written during the same insert.
>
> In the end, inserting a tombstone into a non-clustered column shouldn't be
> appreciably worse (if it is at all) than inserting a value instead.  Or am
> I missing something here?
>
> On Wed, Apr 29, 2015 at 7:53 AM, Matthew Johnson <matt.john...@algomi.com>
> wrote:
>
>> Thank you all for the advice!
>>
>>
>>
>> I have decided to use the Insert query builder (
>> *com.datastax.driver.core.querybuilder.Insert*) which allows me to
>> dynamically insert as many or as few columns as I need, and doesn’t require
>> multiple prepared statements. Then, I will look at Ali’s suggestion – I
>> will create a small helper method like ‘addToInsertIfNotNull’ and pump all
>> my values into that, which will then filter out the ones that are null.
>> Should keep the code nice and neat – I will feed back if I find any
>> problems with this approach (but please jump in if you have already spotted
>> any :)).
>>
>>
>>
>> Thanks!
>>
>> Matt
>>
>>
>>
>> *From:* Robert Wille [mailto:rwi...@fold3.com]
>> *Sent:* 29 April 2015 15:16
>> *To:* user@cassandra.apache.org
>> *Subject:* Re: Inserting null values
>>
>>
>>
>> I’ve come across the same thing. I have a table with at least half a
>> dozen columns that could be null, in any combination. Having a prepared
>> statement for each permutation of null columns just isn’t going to happen.
>> I don’t want to build custom queries each time because I have a really cool
>> system of managing my queries that relies on them being prepared.
>>
>>
>>
>> Fortunately for me, I should have at most a handful of tombstones in each
>> partition, and most of my records are written exactly once. So, I just let
>> the tombstones get written and they’ll eventually get compacted out and
>> life will go on.
>>
>>
>>
>> It’s annoying and not ideal, but what can you do?
>>
>>
>>
>> On Apr 29, 2015, at 2:36 AM, Matthew Johnson <matt.john...@algomi.com>
>> wrote:
>>
>>
>>
>> Hi all,
>>
>>
>>
>> I have some fields that I am storing into Cassandra, but some of them
>> could be null at any given point. As there are quite a lot of them, it
>> makes the code much more readable if I don’t check each one for null before
>> adding it to the INSERT.
>>
>>
>>
>> I can see a few Jiras around CQL 3 supporting inserting nulls:
>>
>>
>>
>> https://issues.apache.org/jira/browse/CASSANDRA-3783
>>
>> https://issues.apache.org/jira/browse/CASSANDRA-5648
>>
>>
>>
>> But I have tested inserting null and it seems to work fine (when querying
>> the table with cqlsh, it shows up as a red lowercase *null*).
>>
>>
>>
>> Are there any obvious pitfalls to look out for that I have missed? Could
>> it be a performance concern to insert a row with some nulls, as opposed to
>> checking the values first and inserting the row and just omitting those
>> columns?
>>
>>
>>
>> Thanks!
>>
>> Matt
>>
>>
>>
>
>

Reply via email to