On Sun, Jul 5, 2015 at 1:40 PM, Roman Tkachenko <ro...@mailgunhq.com> wrote:

> Hey guys,
>
> I have a table with RF=3 and LCS. Data model makes use of "wide rows". A
> certain query run against this table times out and tracing reveals the
> following error on two out of three nodes:
>
> *Scanned over 100000 tombstones; query aborted (see
> tombstone_failure_threshold)*
>
> This basically means every request with CL higher than "one" fails.
>
> I have two questions:
>
> * How could it happen that only two out of three nodes have overwhelming
> tombstones? For the third node tracing shows sensible *"Read 815 live and
> 837 tombstoned cells"* traces.
>

One theory: before 2.1.6 compactions on wide rows with lots of tombstones
could take forever or potentially never finish. What version of Cassandra
are you on? It may be that you got lucky with one node that has been able
to keep up but the others haven't been able to.


>
> * Anything I can do to fix those two nodes? I have already set gc_grace to
> 1 day and tried to make compaction strategy more aggressive
> (unchecked_tombstone_compaction - true, tombstone_threshold - 0.01) to no
> avail - a couple of days have already passed and it still gives the same
> error.
>

You probably want major compaction which is coming soon for LCS (
https://issues.apache.org/jira/browse/CASSANDRA-7272) but not here yet.

The alternative is, if you have enough time and headroom (this is going to
do some pretty serious compaction so be careful), alter your table to STCS,
let it compact into one SSTable, then convert back to LCS. It's pretty
heavy-handed but as long as your gc_grace is low enough it'll do the job.
Definitely do NOT do this if you have many tombstones in single wide rows
and are not >2.1.6


>
> Thanks!
>
> Roman
>
>


-- 
Dan Kinder
Senior Software Engineer
Turnitin – www.turnitin.com
dkin...@turnitin.com

Reply via email to