Hello. I'm having problems with frequent Full GCs that take a long time, and cause a burst of timeouts for the client application. But first, here is my configuration:
Cassandra: 1.1.5 Key Cache : size 209715168 (bytes), capacity 209715168 (bytes), 1331992571 hits, 1831790912 requests, 0.854 recent hit rate, 14400 save period in seconds Row Cache : size 104588313 (bytes), capacity 104857600 (bytes), 585590776 hits, 646992444 requests, 0.880 recent hit rate, 0 save period in seconds flush_largest_memtables_at: 0.8 memtable_total_space_in_mb: 2048 JVM: java version "1.6.0_35" Java(TM) SE Runtime Environment (build 1.6.0_35-b10) Java HotSpot(TM) 64-Bit Server VM (build 20.10-b01, mixed mode) JVM Options: -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:ParallelCMSThreads=4 -Xms16G -Xmx16G -Xmn800M -Xss196k 6 nodes with 32GB RAM, Intel(R) Xeon(R) CPU E5-2609, 300GB data per node. Current status: I had to raise the heap to 16GB due to constant very high heap usage. I've since lowered the bf ratio, but still haven't got to lowering the heap size because I'm still not sure a 8GB heap can handle the load per node. Things I plan to do: - upgrade to Cassandra 1.2. Supposedly it makes better use of the heap. - add more nodes. Still waiting for the hardware. - test again with lower heap sizes. - test JVM 7 (G1?). I've read somewhere that JVM 6 is recommended for Cassandra but Cassandra 2.0 requires JVM 7 so... Besides these, I would also like to double-check my current GC parameters. I'm thinking of increasing the NewSize to 1200M, so that less garbage ends up in the old generation. But here are the GC logs, maybe someone has a better idea? 2013-09-20T07:32:59.090+0100: 141675.121: [GC 141675.121: [ParNew (2: promotion failure size = 9564) (promotion failed) Desired survivor size 41943040 bytes, new threshold 1 (max 1) - age 1: 569496 bytes, 569496 total : 85346K->67732K(737280K), 0.1206690 secs] 14838953K->14882152K(16695296K), 0.1210110 secs] [Times: user=0.58 sys=0.01, real=0.12 secs] Heap after GC invocations=190322 (full 281): par new generation total 737280K, used 67732K [0x00000003fae00000, 0x000000042ce00000, 0x000000042ce00000) eden space 655360K, 3% used [0x00000003fae00000, 0x00000003fc212690, 0x0000000422e00000) from space 81920K, 57% used [0x0000000427e00000, 0x000000042ac12988, 0x000000042ce00000) to space 81920K, 79% used [0x0000000422e00000, 0x0000000426d46538, 0x0000000427e00000) concurrent mark-sweep generation total 15958016K, used 14814420K [0x000000042ce00000, 0x00000007fae00000, 0x00000007fae00000) concurrent-mark-sweep perm gen total 44188K, used 26411K [0x00000007fae00000, 0x00000007fd927000, 0x0000000800000000) } GC locker: Trying a full collection because scavenge failed {Heap before GC invocations=190322 (full 281): par new generation total 737280K, used 67732K [0x00000003fae00000, 0x000000042ce00000, 0x000000042ce00000) eden space 655360K, 3% used [0x00000003fae00000, 0x00000003fc212690, 0x0000000422e00000) from space 81920K, 57% used [0x0000000427e00000, 0x000000042ac12988, 0x000000042ce00000) to space 81920K, 79% used [0x0000000422e00000, 0x0000000426d46538, 0x0000000427e00000) concurrent mark-sweep generation total 15958016K, used 14814420K [0x000000042ce00000, 0x00000007fae00000, 0x00000007fae00000) concurrent-mark-sweep perm gen total 44188K, used 26411K [0x00000007fae00000, 0x00000007fd927000, 0x0000000800000000) 2013-09-20T07:32:59.212+0100: 141675.242: [Full GC 141675.242: [CMS2013-09-20T07:33:06.691+0100: 141682.721: [CMS-concurrent-sweep: 19.719/29.300 secs] [Times: user=146.19 sys=3.74, real=29.31 secs] (concurrent mode failure): 14814420K->7653329K(15958016K), 35.8052520 secs] 14882152K->7653329K(16695296K), [CMS Perm : 26411K->26355K(44188K)], 35.8055760 secs] [Times: user=35.74 sys=0.04, real=35.80 secs] Heap after GC invocations=190323 (full 282): par new generation total 737280K, used 0K [0x00000003fae00000, 0x000000042ce00000, 0x000000042ce00000) eden space 655360K, 0% used [0x00000003fae00000, 0x00000003fae00000, 0x0000000422e00000) from space 81920K, 0% used [0x0000000427e00000, 0x0000000427e00000, 0x000000042ce00000) to space 81920K, 0% used [0x0000000422e00000, 0x0000000422e00000, 0x0000000427e00000) concurrent mark-sweep generation total 15958016K, used 7653329K [0x000000042ce00000, 0x00000007fae00000, 0x00000007fae00000) concurrent-mark-sweep perm gen total 44188K, used 26355K [0x00000007fae00000, 0x00000007fd927000, 0x0000000800000000) } Total time for which application threads were stopped: 35.9305810 seconds Total time for which application threads were stopped: 0.0030240 seconds ... 2013-09-20T09:58:41.765+0100: 150417.795: [GC 150417.796: [ParNew Desired survivor size 41943040 bytes, new threshold 1 (max 1) - age 1: 83887848 bytes, 83887848 total : 674422K->81920K(737280K), 0.0680000 secs] 15824090K->15233726K(16695296K), 0.0685120 secs] [Times: user=0.54 sys=0.00, real=0.07 secs] Heap after GC invocations=214529 (full 323): par new generation total 737280K, used 81920K [0x00000003fae00000, 0x000000042ce00000, 0x000000042ce00000) eden space 655360K, 0% used [0x00000003fae00000, 0x00000003fae00000, 0x0000000422e00000) from space 81920K, 100% used [0x0000000427e00000, 0x000000042ce00000, 0x000000042ce00000) to space 81920K, 0% used [0x0000000422e00000, 0x0000000422e00000, 0x0000000427e00000) concurrent mark-sweep generation total 15958016K, used 15151806K [0x000000042ce00000, 0x00000007fae00000, 0x00000007fae00000) concurrent-mark-sweep perm gen total 44188K, used 26424K [0x00000007fae00000, 0x00000007fd927000, 0x0000000800000000) } Total time for which application threads were stopped: 0.0721100 seconds {Heap before GC invocations=214529 (full 323): par new generation total 737280K, used 82214K [0x00000003fae00000, 0x000000042ce00000, 0x000000042ce00000) eden space 655360K, 0% used [0x00000003fae00000, 0x00000003fae49998, 0x0000000422e00000) from space 81920K, 100% used [0x0000000427e00000, 0x000000042ce00000, 0x000000042ce00000) to space 81920K, 0% used [0x0000000422e00000, 0x0000000422e00000, 0x0000000427e00000) concurrent mark-sweep generation total 15958016K, used 15151806K [0x000000042ce00000, 0x00000007fae00000, 0x00000007fae00000) concurrent-mark-sweep perm gen total 44188K, used 26424K [0x00000007fae00000, 0x00000007fd927000, 0x0000000800000000) 2013-09-20T09:58:41.837+0100: 150417.868: [GC 150417.868: [ParNew (0: promotion failure size = 1677) (1: promotion failure size = 2584) (2: promotion failure size = 2584) (3: promotion failure size = 1723) (4: promotion failure size = 1723) (7: promotion failure size = 4766) (promotion failed) Desired survivor size 41943040 bytes, new threshold 1 (max 1) - age 1: 10688 bytes, 10688 total : 82214K->62920K(737280K), 0.4677070 secs] 15234020K->15298061K(16695296K), 0.4681720 secs] [Times: user=1.15 sys=0.01, real=0.46 secs] Heap after GC invocations=214530 (full 323): par new generation total 737280K, used 62920K [0x00000003fae00000, 0x000000042ce00000, 0x000000042ce00000) eden space 655360K, 0% used [0x00000003fae00000, 0x00000003fae49998, 0x0000000422e00000) from space 81920K, 76% used [0x0000000422e00000, 0x0000000426b28940, 0x0000000427e00000) to space 81920K, 100% used [0x0000000427e00000, 0x000000042ce00000, 0x000000042ce00000) concurrent mark-sweep generation total 15958016K, used 15235140K [0x000000042ce00000, 0x00000007fae00000, 0x00000007fae00000) concurrent-mark-sweep perm gen total 44188K, used 26424K [0x00000007fae00000, 0x00000007fd927000, 0x0000000800000000) } GC locker: Trying a full collection because scavenge failed {Heap before GC invocations=214530 (full 323): par new generation total 737280K, used 62920K [0x00000003fae00000, 0x000000042ce00000, 0x000000042ce00000) eden space 655360K, 0% used [0x00000003fae00000, 0x00000003fae49998, 0x0000000422e00000) from space 81920K, 76% used [0x0000000422e00000, 0x0000000426b28940, 0x0000000427e00000) to space 81920K, 100% used [0x0000000427e00000, 0x000000042ce00000, 0x000000042ce00000) concurrent mark-sweep generation total 15958016K, used 15235140K [0x000000042ce00000, 0x00000007fae00000, 0x00000007fae00000) concurrent-mark-sweep perm gen total 44188K, used 26424K [0x00000007fae00000, 0x00000007fd927000, 0x0000000800000000) 2013-09-20T09:58:42.306+0100: 150418.336: [Full GC 150418.336: [CMS2013-09-20T09:58:46.720+0100: 150422.751: [CMS-concurrent-preclean: 7.057/10.443 secs] [Times: user=45.24 sys=0.32, real=10.44 secs] (concurrent mode failure): 15235140K->5325067K(15958016K), 26.3214360 secs] 15298061K->5325067K(16695296K), [CMS Perm : 26424K->26343K(44188K)], 26.3218650 secs] [Times: user=26.29 sys=0.02, real=26.33 secs] Heap after GC invocations=214531 (full 324): par new generation total 737280K, used 0K [0x00000003fae00000, 0x000000042ce00000, 0x000000042ce00000) eden space 655360K, 0% used [0x00000003fae00000, 0x00000003fae00000, 0x0000000422e00000) from space 81920K, 0% used [0x0000000422e00000, 0x0000000422e00000, 0x0000000427e00000) to space 81920K, 0% used [0x0000000427e00000, 0x0000000427e00000, 0x000000042ce00000) concurrent mark-sweep generation total 15958016K, used 5325067K [0x000000042ce00000, 0x00000007fae00000, 0x00000007fae00000) concurrent-mark-sweep perm gen total 44188K, used 26343K [0x00000007fae00000, 0x00000007fd927000, 0x0000000800000000) } Total time for which application threads were stopped: 26.7940000 seconds Total time for which application threads were stopped: 0.0408350 seconds Total time for which application threads were stopped: 0.0264510 seconds Thanks for the help, André Cruz
signature.asc
Description: Message signed with OpenPGP using GPGMail