Sorry to jump on this late. GC is one of my favorite topics. A while ago I wrote a blob post about C* GC tuning and documented several issues that I had experienced. It seems it has helped some people in that past, so I am sharing it here:
http://aryanet.com/blog/cassandra-garbage-collector-tuning On Thu, Feb 12, 2015 at 11:08 AM, Jiri Horky <ho...@avast.com> wrote: > Number of cores: 2x6Cores x 2(HT). > > I do agree with you that the the hardware is certainly overestimated for > just one Cassandra, but we got a very good price since we ordered several > 10s of the same nodes for a different project. That's why we use for > multiple cassandra instances. > > Jirka H. > > > On 02/12/2015 04:18 PM, Eric Stevens wrote: > > > each node has 256G of memory, 24x1T drives, 2x Xeon CPU > > I don't have first hand experience running Cassandra on such massive > hardware, but it strikes me that these machines are dramatically oversized > to be good candidates for Cassandra (though I wonder how many cores are in > those CPUs; I'm guessing closer to 18 than 2 based on the other hardware). > > A larger cluster of smaller hardware would be a much better shape for > Cassandra. Or several clusters of smaller hardware since you're running > multiple instances on this hardware - best practices have one instance per > host no matter the hardware size. > > On Thu, Feb 12, 2015 at 12:36 AM, Jiri Horky <ho...@avast.com> wrote: > >> Hi Chris, >> >> On 02/09/2015 04:22 PM, Chris Lohfink wrote: >> >> - number of tombstones - how can I reliably find it out? >> https://github.com/spotify/cassandra-opstools >> https://github.com/cloudian/support-tools >> >> thanks. >> >> >> If not getting much compression it may be worth trying to disable it, >> it may contribute but its very unlikely that its the cause of the gc >> pressure itself. >> >> 7000 sstables but STCS? Sounds like compactions couldn't keep up. Do >> you have a lot of pending compactions (nodetool)? You may want to increase >> your compaction throughput (nodetool) to see if you can catch up a little, >> it would cause a lot of heap overhead to do reads with that many. May even >> need to take more drastic measures if it cant catch back up. >> >> I am sorry, I was wrong. We actually do use LCS (the switch was done >> recently). There are almost none pending compaction. We have increased the >> size sstable to 768M, so it should help as as well. >> >> >> May also be good to check `nodetool cfstats` for very wide partitions. >> >> >> There are basically none, this is fine. >> >> It seems that the problem really comes from having so much data in so >> many sstables, so >> org.apache.cassandra.io.compress.CompressedRandomAccessReader classes >> consumes more memory than 0.75*HEAP_SIZE, which triggers the CMS over and >> over. >> >> We have turned off the compression and so far, the situation seems to be >> fine. >> >> Cheers >> Jirka H. >> >> >> >> Theres a good chance if under load and you have over 8gb heap your GCs >> could use tuning. The bigger the nodes the more manual tweaking it will >> require to get the most out of them >> https://issues.apache.org/jira/browse/CASSANDRA-8150 also has some >> ideas. >> >> Chris >> >> On Mon, Feb 9, 2015 at 2:00 AM, Jiri Horky <ho...@avast.com> wrote: >> >>> Hi all, >>> >>> thank you all for the info. >>> >>> To answer the questions: >>> - we have 2 DCs with 5 nodes in each, each node has 256G of memory, >>> 24x1T drives, 2x Xeon CPU - there are multiple cassandra instances running >>> for different project. The node itself is powerful enough. >>> - there 2 keyspaces, one with 3 replicas per DC, one with 1 replica per >>> DC (because of amount of data and because it serves more or less like a >>> cache) >>> - there are about 4k/s Request-response, 3k/s Read and 2k/s Mutation >>> requests - numbers are sum of all nodes >>> - we us STCS (LCS would be quite IO have for this amount of data) >>> - number of tombstones - how can I reliably find it out? >>> - the biggest CF (3.6T per node) has 7000 sstables >>> >>> Now, I understand that the best practice for Cassandra is to run "with >>> the minimum size of heap which is enough" which for this case we thought is >>> about 12G - there is always 8G consumbed by the SSTable readers. Also, I >>> though that high number of tombstones create pressure in the new space >>> (which can then cause pressure in old space as well), but this is not what >>> we are seeing. We see continuous GC activity in Old generation only. >>> >>> Also, I noticed that the biggest CF has Compression factor of 0.99 which >>> basically means that the data come compressed already. Do you think that >>> turning off the compression should help with memory consumption? >>> >>> Also, I think that tuning CMSInitiatingOccupancyFraction=75 might help >>> here, as it seems that 8G is something that Cassandra needs for bookkeeping >>> this amount of data and that this was sligtly above the 75% limit which >>> triggered the CMS again and again. >>> >>> I will definitely have a look at the presentation. >>> >>> Regards >>> Jiri Horky >>> >>> >>> On 02/08/2015 10:32 PM, Mark Reddy wrote: >>> >>> Hey Jiri, >>> >>> While I don't have any experience running 4TB nodes (yet), I would >>> recommend taking a look at a presentation by Arron Morton on large nodes: >>> http://planetcassandra.org/blog/cassandra-community-webinar-videoslides-large-nodes-with-cassandra-by-aaron-morton/ >>> to see if you can glean anything from that. >>> >>> I would note that at the start of his talk he mentions that in version >>> 1.2 we can now talk about nodes around 1 - 3 TB in size, so if you are >>> storing anything more than that you are getting into very specialised use >>> cases. >>> >>> If you could provide us with some more information about your cluster >>> setup (No. of CFs, read/write patterns, do you delete / update often, etc.) >>> that may help in getting you to a better place. >>> >>> >>> Regards, >>> Mark >>> >>> On 8 February 2015 at 21:10, Kevin Burton <bur...@spinn3r.com> wrote: >>> >>>> Do you have a lot of individual tables? Or lots of small compactions? >>>> >>>> I think the general consensus is that (at least for Cassandra), 8GB >>>> heaps are ideal. >>>> >>>> If you have lots of small tables it’s a known anti-pattern (I >>>> believe) because the Cassandra internals could do a better job on handling >>>> the in memory metadata representation. >>>> >>>> I think this has been improved in 2.0 and 2.1 though so the fact that >>>> you’re on 1.2.18 could exasperate the issue. You might want to consider an >>>> upgrade (though that has its own issues as well). >>>> >>>> On Sun, Feb 8, 2015 at 12:44 PM, Jiri Horky <ho...@avast.com> wrote: >>>> >>>>> Hi all, >>>>> >>>>> we are seeing quite high GC pressure (in old space by CMS GC Algorithm) >>>>> on a node with 4TB of data. It runs C* 1.2.18 with 12G of heap memory >>>>> (2G for new space). The node runs fine for couple of days when the GC >>>>> activity starts to raise and reaches about 15% of the C* activity which >>>>> causes dropped messages and other problems. >>>>> >>>>> Taking a look at heap dump, there is about 8G used by SSTableReader >>>>> classes in >>>>> org.apache.cassandra.io.compress.CompressedRandomAccessReader. >>>>> >>>>> Is this something expected and we have just reached the limit of how >>>>> many data a single Cassandra instance can handle or it is possible to >>>>> tune it better? >>>>> >>>>> Regards >>>>> Jiri Horky >>>>> >>>> >>>> >>>> >>>> -- >>>> Founder/CEO Spinn3r.com >>>> Location: *San Francisco, CA* >>>> blog: http://burtonator.wordpress.com >>>> … or check out my Google+ profile >>>> <https://plus.google.com/102718274791889610666/posts> >>>> <http://spinn3r.com> >>>> >>> >>> >>> >> >> > > -- Cheers, -Arya