16 Gig heap, with G1. Pertinent info from jvm.options below (we’re using m2.2xlarge instances in AWS):
################# # HEAP SETTINGS # ################# # Heap size is automatically calculated by cassandra-env based on this # formula: max(min(1/2 ram, 1024MB), min(1/4 ram, 8GB)) # That is: # - calculate 1/2 ram and cap to 1024MB # - calculate 1/4 ram and cap to 8192MB # - pick the max # # For production use you may wish to adjust this for your environment. # If that's the case, uncomment the -Xmx and Xms options below to override the # automatic calculation of JVM heap memory. # # It is recommended to set min (-Xms) and max (-Xmx) heap sizes to # the same value to avoid stop-the-world GC pauses during resize, and # so that we can lock the heap in memory on startup to prevent any # of it from being swapped out. -Xms16G -Xmx16G # Young generation size is automatically calculated by cassandra-env # based on this formula: min(100 * num_cores, 1/4 * heap size) # # The main trade-off for the young generation is that the larger it # is, the longer GC pause times will be. The shorter it is, the more # expensive GC will be (usually). # # It is not recommended to set the young generation size if using the # G1 GC, since that will override the target pause-time goal. # More info: http://www.oracle.com/technetwork/articles/java/g1gc-1984535.html # # The example below assumes a modern 8-core+ machine for decent # times. If in doubt, and if you do not particularly want to tweak, go # 100 MB per physical CPU core. #-Xmn800M ################# # GC SETTINGS # ################# ### CMS Settings #-XX:+UseParNewGC #-XX:+UseConcMarkSweepGC #-XX:+CMSParallelRemarkEnabled #-XX:SurvivorRatio=8 #-XX:MaxTenuringThreshold=1 #-XX:CMSInitiatingOccupancyFraction=75 #-XX:+UseCMSInitiatingOccupancyOnly #-XX:CMSWaitDuration=10000 #-XX:+CMSParallelInitialMarkEnabled #-XX:+CMSEdenChunksRecordAlways # some JVMs will fill up their heap when accessed via JMX, see CASSANDRA-6541 #-XX:+CMSClassUnloadingEnabled ### G1 Settings (experimental, comment previous section and uncomment section below to enable) ## Use the Hotspot garbage-first collector. -XX:+UseG1GC # ## Have the JVM do less remembered set work during STW, instead ## preferring concurrent GC. Reduces p99.9 latency. -XX:G1RSetUpdatingPauseTimePercent=5 # ## Main G1GC tunable: lowering the pause target will lower throughput and vise versa. ## 200ms is the JVM default and lowest viable setting ## 1000ms increases throughput. Keep it smaller than the timeouts in cassandra.yaml. -XX:MaxGCPauseMillis=500 ## Optional G1 Settings # Save CPU time on large (>= 16GB) heaps by delaying region scanning # until the heap is 70% full. The default in Hotspot 8u40 is 40%. -XX:InitiatingHeapOccupancyPercent=70 # For systems with > 8 cores, the default ParallelGCThreads is 5/8 the number of logical cores. # Otherwise equal to the number of cores when 8 or less. # Machines with > 10 cores should try setting these to <= full cores. #-XX:ParallelGCThreads=16 # By default, ConcGCThreads is 1/4 of ParallelGCThreads. # Setting both to the same value can reduce STW durations. #-XX:ConcGCThreads=16 ### GC logging options -- uncomment to enable #-XX:+PrintGCDetails #-XX:+PrintGCDateStamps #-XX:+PrintHeapAtGC #-XX:+PrintTenuringDistribution #-XX:+PrintGCApplicationStoppedTime #-XX:+PrintPromotionFailure #-XX:PrintFLSStatistics=1 #-Xloggc:/var/log/cassandra/gc.log #-XX:+UseGCLogFileRotation #-XX:NumberOfGCLogFiles=10 #-XX:GCLogFileSize=10M From: Alexander Dejanovski <a...@thelastpickle.com> Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org> Date: Monday, April 3, 2017 at 8:00 AM To: "user@cassandra.apache.org" <user@cassandra.apache.org> Subject: Re: cassandra OOM Hi, could you share your GC settings ? G1 or CMS ? Heap size, etc... Thanks, On Sun, Apr 2, 2017 at 10:30 PM Gopal, Dhruva <dhruva.go...@aspect.com<mailto:dhruva.go...@aspect.com>> wrote: Hi – We’ve had what looks like an OOM situation with Cassandra (we have a dump file that got generated) in our staging (performance/load testing environment) and I wanted to reach out to this user group to see if you had any recommendations on how we should approach our investigation as to the cause of this issue. The logs don’t seem to point to any obvious issues, and we’re no experts in analyzing this by any means, so was looking for guidance on how to proceed. Should we enter a Jira as well? We’re on Cassandra 3.9, and are running a six node cluster. This happened in a controlled load testing environment. Feedback will be much appreciated! Regards, Dhruva This email (including any attachments) is proprietary to Aspect Software, Inc. and may contain information that is confidential. If you have received this message in error, please do not read, copy or forward this message. Please notify the sender immediately, delete it from your system and destroy any copies. You may not further disclose or distribute this email or its attachments. -- ----------------- Alexander Dejanovski France @alexanderdeja Consultant Apache Cassandra Consulting http://www.thelastpickle.com<http://www.thelastpickle.com/> This email (including any attachments) is proprietary to Aspect Software, Inc. and may contain information that is confidential. If you have received this message in error, please do not read, copy or forward this message. Please notify the sender immediately, delete it from your system and destroy any copies. You may not further disclose or distribute this email or its attachments.