Enough tombstones can inflate the size of an SSTable causing issues during compaction (imagine a multi tb sstable w/ 99% tombstones) even if there's no clustering key defined.
Perhaps an edge case, but worth considering. On Wed, Apr 29, 2015 at 9:17 AM Eric Stevens <migh...@gmail.com> wrote: > Correct me if I'm wrong, but tombstones are only really problematic if you > have them going into clustering keys, then perform a range select on that > column, right (assuming it's not a symptom of the antipattern of > indefinitely overwriting the same value)? I.E. you're deleting clusters > off of a partition. A tombstone isn't any more costly, and in some ways > less costly than a normal column (it's a smaller size at rest than, say, > inserting an empty string or other default value as someone suggested). > > Tombstones stay around a little longer post-compaction than other values, > so that's a downside, but they also would drop off the record as if it had > never been set on the next compaction after gc grace period. > > Tombstones aren't intrinsically bad, but they can have some bad properties > in certain situations. This doesn't strike me as one of them. If you have > a way to avoid inserting null when you know you aren't occluding an > underlying value, that would be ideal. But because the tombstone would sit > adjacent on disk to other values from the same insert, even if you were on > platters, the drive head is *already positioned* over the tombstone > location when it's read, because it read the prior value and subsequent > value which were written during the same insert. > > In the end, inserting a tombstone into a non-clustered column shouldn't be > appreciably worse (if it is at all) than inserting a value instead. Or am > I missing something here? > > On Wed, Apr 29, 2015 at 7:53 AM, Matthew Johnson <matt.john...@algomi.com> > wrote: > >> Thank you all for the advice! >> >> >> >> I have decided to use the Insert query builder ( >> *com.datastax.driver.core.querybuilder.Insert*) which allows me to >> dynamically insert as many or as few columns as I need, and doesn’t require >> multiple prepared statements. Then, I will look at Ali’s suggestion – I >> will create a small helper method like ‘addToInsertIfNotNull’ and pump all >> my values into that, which will then filter out the ones that are null. >> Should keep the code nice and neat – I will feed back if I find any >> problems with this approach (but please jump in if you have already spotted >> any :)). >> >> >> >> Thanks! >> >> Matt >> >> >> >> *From:* Robert Wille [mailto:rwi...@fold3.com] >> *Sent:* 29 April 2015 15:16 >> *To:* user@cassandra.apache.org >> *Subject:* Re: Inserting null values >> >> >> >> I’ve come across the same thing. I have a table with at least half a >> dozen columns that could be null, in any combination. Having a prepared >> statement for each permutation of null columns just isn’t going to happen. >> I don’t want to build custom queries each time because I have a really cool >> system of managing my queries that relies on them being prepared. >> >> >> >> Fortunately for me, I should have at most a handful of tombstones in each >> partition, and most of my records are written exactly once. So, I just let >> the tombstones get written and they’ll eventually get compacted out and >> life will go on. >> >> >> >> It’s annoying and not ideal, but what can you do? >> >> >> >> On Apr 29, 2015, at 2:36 AM, Matthew Johnson <matt.john...@algomi.com> >> wrote: >> >> >> >> Hi all, >> >> >> >> I have some fields that I am storing into Cassandra, but some of them >> could be null at any given point. As there are quite a lot of them, it >> makes the code much more readable if I don’t check each one for null before >> adding it to the INSERT. >> >> >> >> I can see a few Jiras around CQL 3 supporting inserting nulls: >> >> >> >> https://issues.apache.org/jira/browse/CASSANDRA-3783 >> >> https://issues.apache.org/jira/browse/CASSANDRA-5648 >> >> >> >> But I have tested inserting null and it seems to work fine (when querying >> the table with cqlsh, it shows up as a red lowercase *null*). >> >> >> >> Are there any obvious pitfalls to look out for that I have missed? Could >> it be a performance concern to insert a row with some nulls, as opposed to >> checking the values first and inserting the row and just omitting those >> columns? >> >> >> >> Thanks! >> >> Matt >> >> >> > >