Correct me if I'm wrong, but tombstones are only really problematic if you
have them going into clustering keys, then perform a range select on that
column, right (assuming it's not a symptom of the antipattern of
indefinitely overwriting the same value)?  I.E. you're deleting clusters
off of a partition.  A tombstone isn't any more costly, and in some ways
less costly than a normal column (it's a smaller size at rest than, say,
inserting an empty string or other default value as someone suggested).

Tombstones stay around a little longer post-compaction than other values,
so that's a downside, but they also would drop off the record as if it had
never been set on the next compaction after gc grace period.

Tombstones aren't intrinsically bad, but they can have some bad properties
in certain situations.  This doesn't strike me as one of them.  If you have
a way to avoid inserting null when you know you aren't occluding an
underlying value, that would be ideal.  But because the tombstone would sit
adjacent on disk to other values from the same insert, even if you were on
platters, the drive head is *already positioned* over the tombstone
location when it's read, because it read the prior value and subsequent
value which were written during the same insert.

In the end, inserting a tombstone into a non-clustered column shouldn't be
appreciably worse (if it is at all) than inserting a value instead.  Or am
I missing something here?

On Wed, Apr 29, 2015 at 7:53 AM, Matthew Johnson <matt.john...@algomi.com>
wrote:

> Thank you all for the advice!
>
>
>
> I have decided to use the Insert query builder (
> *com.datastax.driver.core.querybuilder.Insert*) which allows me to
> dynamically insert as many or as few columns as I need, and doesn’t require
> multiple prepared statements. Then, I will look at Ali’s suggestion – I
> will create a small helper method like ‘addToInsertIfNotNull’ and pump all
> my values into that, which will then filter out the ones that are null.
> Should keep the code nice and neat – I will feed back if I find any
> problems with this approach (but please jump in if you have already spotted
> any :)).
>
>
>
> Thanks!
>
> Matt
>
>
>
> *From:* Robert Wille [mailto:rwi...@fold3.com]
> *Sent:* 29 April 2015 15:16
> *To:* user@cassandra.apache.org
> *Subject:* Re: Inserting null values
>
>
>
> I’ve come across the same thing. I have a table with at least half a dozen
> columns that could be null, in any combination. Having a prepared statement
> for each permutation of null columns just isn’t going to happen. I don’t
> want to build custom queries each time because I have a really cool system
> of managing my queries that relies on them being prepared.
>
>
>
> Fortunately for me, I should have at most a handful of tombstones in each
> partition, and most of my records are written exactly once. So, I just let
> the tombstones get written and they’ll eventually get compacted out and
> life will go on.
>
>
>
> It’s annoying and not ideal, but what can you do?
>
>
>
> On Apr 29, 2015, at 2:36 AM, Matthew Johnson <matt.john...@algomi.com>
> wrote:
>
>
>
> Hi all,
>
>
>
> I have some fields that I am storing into Cassandra, but some of them
> could be null at any given point. As there are quite a lot of them, it
> makes the code much more readable if I don’t check each one for null before
> adding it to the INSERT.
>
>
>
> I can see a few Jiras around CQL 3 supporting inserting nulls:
>
>
>
> https://issues.apache.org/jira/browse/CASSANDRA-3783
>
> https://issues.apache.org/jira/browse/CASSANDRA-5648
>
>
>
> But I have tested inserting null and it seems to work fine (when querying
> the table with cqlsh, it shows up as a red lowercase *null*).
>
>
>
> Are there any obvious pitfalls to look out for that I have missed? Could
> it be a performance concern to insert a row with some nulls, as opposed to
> checking the values first and inserting the row and just omitting those
> columns?
>
>
>
> Thanks!
>
> Matt
>
>
>

Reply via email to