RE: Inserting null values

Peer, Oded Wed, 06 May 2015 23:38:49 -0700

I’ve added an option to prevent tombstone creation when using 
PreparedStatements to trunk, see CASSANDRA-7304.

The problem is having tombstones in regular columns.
When you perform a read request (range query or by PK):
- Cassandra iterates over all the cells (all, not only the cells specified in 
the query) in the relevant rows while counting tombstone cells 
(https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/filter/SliceQueryFilter.java#L199)
- creates a ColumnFamily object instance with the rows
- filters the selected columns from the internal CF 
(https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/statements/SelectStatement.java#L653)
- returns the result

If you have many unnecessary tombstones you read many unnecessary cells.

From: Eric Stevens [mailto:migh...@gmail.com]
Sent: Wednesday, May 06, 2015 4:37 PM
To: user@cassandra.apache.org
Subject: Re: Inserting null values

I agree that inserting null is not as good as not inserting that column at all 
when you have confidence that you are not shadowing any underlying data. But 
pragmatically speaking it really doesn't sound like a small number of 
incidental nulls/tombstones (< 20% of columns, otherwise CASSANDRA-3442 takes 
over) is going to have any performance impact either in your query patterns or 
in compaction in any practical sense.

If INSERT of null values is problematic for small portions of your data, then 
it stands to reason that an INSERT option containing an instruction to prevent 
tombstone creation would be an important performance optimization (and would 
also address the fact that non-null collections also generate tombstones on 
INSERT as well).  INSERT INTO ... USING no_tombstones;

> There's thresholds (log messages, etc.) which operate on tombstone counts 
> over a certain number, but not on column counts over the same number.

tombstone_warn_threshold and tombstone_failure_threshold only apply to 
clustering scans right?  I.E. tombstones don't count against those thresholds 
if they are not part of the clustering key column being considered for the 
non-EQ relation?  The documentation certainly implies so:

tombstone_warn_threshold¶<http://docs.datastax.com/en/cassandra/2.0/cassandra/configuration/configCassandra_yaml_r.html?scroll=reference_ds_qfg_n1r_1k__tombstone_warn_threshold>
(Default: 1000) The maximum number of tombstones a query can scan before 
warning.
tombstone_failure_threshold¶<http://docs.datastax.com/en/cassandra/2.0/cassandra/configuration/configCassandra_yaml_r.html?scroll=reference_ds_qfg_n1r_1k__tombstone_failure_threshold>
(Default: 100000) The maximum number of tombstones a query can scan before 
aborting.

On Wed, Apr 29, 2015 at 12:42 PM, Robert Coli 
<rc...@eventbrite.com<mailto:rc...@eventbrite.com>> wrote:
On Wed, Apr 29, 2015 at 9:16 AM, Eric Stevens 
<migh...@gmail.com<mailto:migh...@gmail.com>> wrote:
In the end, inserting a tombstone into a non-clustered column shouldn't be 
appreciably worse (if it is at all) than inserting a value instead.  Or am I 
missing something here?

There's thresholds (log messages, etc.) which operate on tombstone counts over 
a certain number, but not on column counts over the same number.

Given that tombstones are often smaller than data columns, sorta hard to 
understand conceptually?

=Rob

RE: Inserting null values

Reply via email to