I agree that inserting null is not as good as not inserting that column at all when you have confidence that you are not shadowing any underlying data. But pragmatically speaking it really doesn't sound like a small number of incidental nulls/tombstones (< 20% of columns, otherwise CASSANDRA-3442 takes over) is going to have any performance impact either in your query patterns or in compaction in any practical sense.
If INSERT of null values is problematic for small portions of your data, then it stands to reason that an INSERT option containing an instruction to prevent tombstone creation would be an important performance optimization (and would also address the fact that non-null collections also generate tombstones on INSERT as well). INSERT INTO ... USING no_tombstones; > There's thresholds (log messages, etc.) which operate on tombstone counts over a certain number, but not on column counts over the same number. tombstone_warn_threshold and tombstone_failure_threshold only apply to clustering scans right? I.E. tombstones don't count against those thresholds if they are not part of the clustering key column being considered for the non-EQ relation? The documentation certainly implies so: tombstone_warn_threshold¶ <http://docs.datastax.com/en/cassandra/2.0/cassandra/configuration/configCassandra_yaml_r.html?scroll=reference_ds_qfg_n1r_1k__tombstone_warn_threshold> (Default: 1000) The maximum number of tombstones a query can scan before warning.tombstone_failure_threshold¶ <http://docs.datastax.com/en/cassandra/2.0/cassandra/configuration/configCassandra_yaml_r.html?scroll=reference_ds_qfg_n1r_1k__tombstone_failure_threshold> (Default: 100000) The maximum number of tombstones a query can scan before aborting. On Wed, Apr 29, 2015 at 12:42 PM, Robert Coli <rc...@eventbrite.com> wrote: > On Wed, Apr 29, 2015 at 9:16 AM, Eric Stevens <migh...@gmail.com> wrote: > >> In the end, inserting a tombstone into a non-clustered column shouldn't >> be appreciably worse (if it is at all) than inserting a value instead. Or >> am I missing something here? >> > > There's thresholds (log messages, etc.) which operate on tombstone counts > over a certain number, but not on column counts over the same number. > > Given that tombstones are often smaller than data columns, sorta hard to > understand conceptually? > > =Rob > >