If I understand the problem correctly, tombstone_failure_theshold is never
reached because the ~2M objects might have been collected for different
queries running in parallel, not for one query. Every separate query never
reached the threshold although all together they contributed to the OOM.

You can read a bit more about the anti-patterns (particularly, ones related
to workloads generating lots of tombstones):
http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets

You can also try running more frequent repair/compacts. Although I'd look
closer on the read queries first, possibly with tracing on, and check
parallelism for those. Maybe decrease warn level for tombstone thresholds
to understand where the bounds are.

On Thu, Apr 28, 2016 at 7:23 PM Rick Gunderson <rgunder...@ca.ibm.com>
wrote:

> We are running Cassandra 2.2.3, 2 data centers, 3 nodes in each. The
> replication factor per datacenter is 3. The Xmx setting on the Cassandra
> JVMs is 4GB.
>
> We have a workload that generates loots of tombstones and Cassandra goes
> OOM in about 24 hours. We've adjusted the tombstone_failure_threshold down
> to 25000 but we never see the TombstoneOverwhelmingException before the
> nodes start going OOM.
>
> The table operation that looks to be the culprit is a scan of partition
> keys (i.e. we are scanning across narrow rows, not scanning within a wide
> row). The heapdump shows we have a RangeSliceReply containing an ArrayList
> with 1,823,230 org.apache.cassandra.db.Row objects with a retained heap
> size of 441MiB.  A look inside one of the Row objects shows an
> org.apache.cassandra.db.DeletionInfo object so I assume that means the row
> has been tombstoned.
>
> If all of the 1,823,239 Row objects are tombstoned (and it is likely that
> most of them are), is there a reason that the
> TombstoneOverwhelmingException never gets thrown?
>
>
>
> Regards,
>
> *Rick (R.) Gunderson *
> Software Engineer
> IBM Commerce, B2B Development - GDHA
> ------------------------------
> [image: 2D barcode - encoded with contact information] *Phone: *1-250-220-1053
>
> *E-mail:* *rgunder...@ca.ibm.com* <rgunder...@ca.ibm.com>
> *Find me on:* [image: LinkedIn:
> http://ca.linkedin.com/pub/rick-gunderson/0/443/241]
> <http://ca.linkedin.com/pub/rick-gunderson/0/443/241>
> [image: IBM]
>
> 1803 Douglas St
> Victoria, BC V8T 5C3
> Canada
>
>
> --
Alex

Reply via email to