Re: High GC activity on node with 4TB on data

Francois Richard Sun, 08 Feb 2015 15:06:07 -0800

Hi Jiri,
We do run multiple nodes with 2TB to 4TB of data and we will usually see GC 
pressure when we create a lot of tombstones.
With Cassandra 2.0.x you would be able to see a log with the following pattern:
WARN [ReadStage:7] 2015-02-08 22:55:09,621 SliceQueryFilter.java (line 225) 
Read 939 live and 1017 tombstoned cells in SyncCore.ContactInformation (see 
tombstone_warn_threshold). 1000 columns was requested, slices=[-], 
delInfo={deletedAt=-9223372036854775808, localDeletion=2147483647}
This basically indicates that you add some major deletions for a given row.
Thanks,

FR
     From: Mark Reddy <mark.l.re...@gmail.com>
 To: user@cassandra.apache.org 
Cc: cassandra-u...@apache.org; FF Systems <ff-sys...@avast.com> 
 Sent: Sunday, February 8, 2015 1:32 PM
 Subject: Re: High GC activity on node with 4TB on data

Hey Jiri, 
While I don't have any experience running 4TB nodes (yet), I would recommend 
taking a look at a presentation by Arron Morton on large nodes: 
http://planetcassandra.org/blog/cassandra-community-webinar-videoslides-large-nodes-with-cassandra-by-aaron-morton/
 to see if you can glean anything from that.
I would note that at the start of his talk he mentions that in version 1.2 we 
can now talk about nodes around 1 - 3 TB in size, so if you are storing 
anything more than that you are getting into very specialised use cases.
If you could provide us with some more information about your cluster setup 
(No. of CFs, read/write patterns, do you delete / update often, etc.) that may 
help in getting you to a better place.

Regards,
Mark

On 8 February 2015 at 21:10, Kevin Burton <bur...@spinn3r.com> wrote:

Do you have a lot of individual tables?  Or lots of small compactions?
I think the general consensus is that (at least for Cassandra), 8GB heaps are 
ideal.  
If you have lots of small tables it’s a known anti-pattern (I believe) because 
the Cassandra internals could do a better job on handling the in memory 
metadata representation.
I think this has been improved in 2.0 and 2.1 though so the fact that you’re on 
1.2.18 could exasperate the issue.  You might want to consider an upgrade 
(though that has its own issues as well).
On Sun, Feb 8, 2015 at 12:44 PM, Jiri Horky <ho...@avast.com> wrote:

Hi all,

we are seeing quite high GC pressure (in old space by CMS GC Algorithm)
on a node with 4TB of data. It runs C* 1.2.18 with 12G of heap memory
(2G for new space). The node runs fine for couple of days when the GC
activity starts to raise and reaches about 15% of the C* activity which
causes dropped messages and other problems.

Taking a look at heap dump, there is about 8G used by SSTableReader
classes in org.apache.cassandra.io.compress.CompressedRandomAccessReader.

Is this something expected and we have just reached the limit of how
many data a single Cassandra instance can handle or it is possible to
tune it better?

Regards
Jiri Horky

-- 
Founder/CEO Spinn3r.com
Location: San Francisco, CA
blog: http://burtonator.wordpress.com… or check out my Google+ profile

Re: High GC activity on node with 4TB on data

Reply via email to