Your ParNew size is way too small. Generally 4GB ParNew (-Xmn) works out best for 16GB heap
On Mon, Sep 23, 2013 at 9:05 PM, 谢良 <xieli...@xiaomi.com> wrote: > it looks to me that "MaxTenuringThreshold" is too small, do you have any > chance to try with a bigger one, like 4 or 8 or sth else? > > ________________________________________ > 发件人: André Cruz [andre.c...@co.sapo.pt] > 发送时间: 2013年9月23日 22:51 > 收件人: user@cassandra.apache.org > 主题: Frequent Full GC that take > 30s > > Hello. > > I'm having problems with frequent Full GCs that take a long time, and > cause a burst of timeouts for the client application. But first, here is my > configuration: > > Cassandra: 1.1.5 > Key Cache : size 209715168 (bytes), capacity 209715168 (bytes), > 1331992571 hits, 1831790912 requests, 0.854 recent hit rate, 14400 save > period in seconds > Row Cache : size 104588313 (bytes), capacity 104857600 (bytes), > 585590776 hits, 646992444 requests, 0.880 recent hit rate, 0 save period in > seconds > flush_largest_memtables_at: 0.8 > memtable_total_space_in_mb: 2048 > > JVM: > java version "1.6.0_35" > Java(TM) SE Runtime Environment (build 1.6.0_35-b10) > Java HotSpot(TM) 64-Bit Server VM (build 20.10-b01, mixed mode) > > JVM Options: > -XX:+UseParNewGC > -XX:+UseConcMarkSweepGC > -XX:+CMSParallelRemarkEnabled > -XX:SurvivorRatio=8 > -XX:MaxTenuringThreshold=1 > -XX:CMSInitiatingOccupancyFraction=75 > -XX:+UseCMSInitiatingOccupancyOnly > -XX:ParallelCMSThreads=4 > -Xms16G > -Xmx16G > -Xmn800M > -Xss196k > > 6 nodes with 32GB RAM, Intel(R) Xeon(R) CPU E5-2609, 300GB data per node. > > > Current status: > I had to raise the heap to 16GB due to constant very high heap usage. I've > since lowered the bf ratio, but still haven't got to lowering the heap size > because I'm still not sure a 8GB heap can handle the load per node. > > Things I plan to do: > - upgrade to Cassandra 1.2. Supposedly it makes better use of the heap. > - add more nodes. Still waiting for the hardware. > - test again with lower heap sizes. > - test JVM 7 (G1?). I've read somewhere that JVM 6 is recommended for > Cassandra but Cassandra 2.0 requires JVM 7 so... > > Besides these, I would also like to double-check my current GC parameters. > I'm thinking of increasing the NewSize to 1200M, so that less garbage ends > up in the old generation. But here are the GC logs, maybe someone has a > better idea? > > 2013-09-20T07:32:59.090+0100: 141675.121: [GC 141675.121: [ParNew (2: > promotion failure size = 9564) (promotion failed) > Desired survivor size 41943040 bytes, new threshold 1 (max 1) > - age 1: 569496 bytes, 569496 total > : 85346K->67732K(737280K), 0.1206690 secs] > 14838953K->14882152K(16695296K), 0.1210110 secs] [Times: user=0.58 > sys=0.01, real=0.12 secs] > Heap after GC invocations=190322 (full 281): > par new generation total 737280K, used 67732K [0x00000003fae00000, > 0x000000042ce00000, 0x000000042ce00000) > eden space 655360K, 3% used [0x00000003fae00000, 0x00000003fc212690, > 0x0000000422e00000) > from space 81920K, 57% used [0x0000000427e00000, 0x000000042ac12988, > 0x000000042ce00000) > to space 81920K, 79% used [0x0000000422e00000, 0x0000000426d46538, > 0x0000000427e00000) > concurrent mark-sweep generation total 15958016K, used 14814420K > [0x000000042ce00000, 0x00000007fae00000, 0x00000007fae00000) > concurrent-mark-sweep perm gen total 44188K, used 26411K > [0x00000007fae00000, 0x00000007fd927000, 0x0000000800000000) > } > GC locker: Trying a full collection because scavenge failed > {Heap before GC invocations=190322 (full 281): > par new generation total 737280K, used 67732K [0x00000003fae00000, > 0x000000042ce00000, 0x000000042ce00000) > eden space 655360K, 3% used [0x00000003fae00000, 0x00000003fc212690, > 0x0000000422e00000) > from space 81920K, 57% used [0x0000000427e00000, 0x000000042ac12988, > 0x000000042ce00000) > to space 81920K, 79% used [0x0000000422e00000, 0x0000000426d46538, > 0x0000000427e00000) > concurrent mark-sweep generation total 15958016K, used 14814420K > [0x000000042ce00000, 0x00000007fae00000, 0x00000007fae00000) > concurrent-mark-sweep perm gen total 44188K, used 26411K > [0x00000007fae00000, 0x00000007fd927000, 0x0000000800000000) > 2013-09-20T07:32:59.212+0100: 141675.242: [Full GC 141675.242: > [CMS2013-09-20T07:33:06.691+0100: 141682.721: [CMS-concurrent-sweep: > 19.719/29.300 secs] [Times: user=146.19 sys=3.74, real=29.31 secs] > (concurrent mode failure): 14814420K->7653329K(15958016K), 35.8052520 > secs] 14882152K->7653329K(16695296K), [CMS Perm : 26411K->26355K(44188K)], > 35.8055760 secs] [Times: user=35.74 sys=0.04, real=35.80 secs] > Heap after GC invocations=190323 (full 282): > par new generation total 737280K, used 0K [0x00000003fae00000, > 0x000000042ce00000, 0x000000042ce00000) > eden space 655360K, 0% used [0x00000003fae00000, 0x00000003fae00000, > 0x0000000422e00000) > from space 81920K, 0% used [0x0000000427e00000, 0x0000000427e00000, > 0x000000042ce00000) > to space 81920K, 0% used [0x0000000422e00000, 0x0000000422e00000, > 0x0000000427e00000) > concurrent mark-sweep generation total 15958016K, used 7653329K > [0x000000042ce00000, 0x00000007fae00000, 0x00000007fae00000) > concurrent-mark-sweep perm gen total 44188K, used 26355K > [0x00000007fae00000, 0x00000007fd927000, 0x0000000800000000) > } > Total time for which application threads were stopped: 35.9305810 seconds > Total time for which application threads were stopped: 0.0030240 seconds > > > ... > > > 2013-09-20T09:58:41.765+0100: 150417.795: [GC 150417.796: [ParNew > Desired survivor size 41943040 bytes, new threshold 1 (max 1) > - age 1: 83887848 bytes, 83887848 total > : 674422K->81920K(737280K), 0.0680000 secs] > 15824090K->15233726K(16695296K), 0.0685120 secs] [Times: user=0.54 > sys=0.00, real=0.07 secs] > Heap after GC invocations=214529 (full 323): > par new generation total 737280K, used 81920K [0x00000003fae00000, > 0x000000042ce00000, 0x000000042ce00000) > eden space 655360K, 0% used [0x00000003fae00000, 0x00000003fae00000, > 0x0000000422e00000) > from space 81920K, 100% used [0x0000000427e00000, 0x000000042ce00000, > 0x000000042ce00000) > to space 81920K, 0% used [0x0000000422e00000, 0x0000000422e00000, > 0x0000000427e00000) > concurrent mark-sweep generation total 15958016K, used 15151806K > [0x000000042ce00000, 0x00000007fae00000, 0x00000007fae00000) > concurrent-mark-sweep perm gen total 44188K, used 26424K > [0x00000007fae00000, 0x00000007fd927000, 0x0000000800000000) > } > Total time for which application threads were stopped: 0.0721100 seconds > {Heap before GC invocations=214529 (full 323): > par new generation total 737280K, used 82214K [0x00000003fae00000, > 0x000000042ce00000, 0x000000042ce00000) > eden space 655360K, 0% used [0x00000003fae00000, 0x00000003fae49998, > 0x0000000422e00000) > from space 81920K, 100% used [0x0000000427e00000, 0x000000042ce00000, > 0x000000042ce00000) > to space 81920K, 0% used [0x0000000422e00000, 0x0000000422e00000, > 0x0000000427e00000) > concurrent mark-sweep generation total 15958016K, used 15151806K > [0x000000042ce00000, 0x00000007fae00000, 0x00000007fae00000) > concurrent-mark-sweep perm gen total 44188K, used 26424K > [0x00000007fae00000, 0x00000007fd927000, 0x0000000800000000) > 2013-09-20T09:58:41.837+0100: 150417.868: [GC 150417.868: [ParNew (0: > promotion failure size = 1677) (1: promotion failure size = 2584) (2: > promotion failure size = 2584) (3: promotion failure size = 1723) (4: > promotion failure size = 1723) (7: promotion failure size > = 4766) (promotion failed) > Desired survivor size 41943040 bytes, new threshold 1 (max 1) > - age 1: 10688 bytes, 10688 total > : 82214K->62920K(737280K), 0.4677070 secs] > 15234020K->15298061K(16695296K), 0.4681720 secs] [Times: user=1.15 > sys=0.01, real=0.46 secs] > Heap after GC invocations=214530 (full 323): > par new generation total 737280K, used 62920K [0x00000003fae00000, > 0x000000042ce00000, 0x000000042ce00000) > eden space 655360K, 0% used [0x00000003fae00000, 0x00000003fae49998, > 0x0000000422e00000) > from space 81920K, 76% used [0x0000000422e00000, 0x0000000426b28940, > 0x0000000427e00000) > to space 81920K, 100% used [0x0000000427e00000, 0x000000042ce00000, > 0x000000042ce00000) > concurrent mark-sweep generation total 15958016K, used 15235140K > [0x000000042ce00000, 0x00000007fae00000, 0x00000007fae00000) > concurrent-mark-sweep perm gen total 44188K, used 26424K > [0x00000007fae00000, 0x00000007fd927000, 0x0000000800000000) > } > GC locker: Trying a full collection because scavenge failed > {Heap before GC invocations=214530 (full 323): > par new generation total 737280K, used 62920K [0x00000003fae00000, > 0x000000042ce00000, 0x000000042ce00000) > eden space 655360K, 0% used [0x00000003fae00000, 0x00000003fae49998, > 0x0000000422e00000) > from space 81920K, 76% used [0x0000000422e00000, 0x0000000426b28940, > 0x0000000427e00000) > to space 81920K, 100% used [0x0000000427e00000, 0x000000042ce00000, > 0x000000042ce00000) > concurrent mark-sweep generation total 15958016K, used 15235140K > [0x000000042ce00000, 0x00000007fae00000, 0x00000007fae00000) > concurrent-mark-sweep perm gen total 44188K, used 26424K > [0x00000007fae00000, 0x00000007fd927000, 0x0000000800000000) > 2013-09-20T09:58:42.306+0100: 150418.336: [Full GC 150418.336: > [CMS2013-09-20T09:58:46.720+0100: 150422.751: [CMS-concurrent-preclean: > 7.057/10.443 secs] [Times: user=45.24 sys=0.32, real=10.44 secs] > (concurrent mode failure): 15235140K->5325067K(15958016K), 26.3214360 > secs] 15298061K->5325067K(16695296K), [CMS Perm : 26424K->26343K(44188K)], > 26.3218650 secs] [Times: user=26.29 sys=0.02, real=26.33 secs] > Heap after GC invocations=214531 (full 324): > par new generation total 737280K, used 0K [0x00000003fae00000, > 0x000000042ce00000, 0x000000042ce00000) > eden space 655360K, 0% used [0x00000003fae00000, 0x00000003fae00000, > 0x0000000422e00000) > from space 81920K, 0% used [0x0000000422e00000, 0x0000000422e00000, > 0x0000000427e00000) > to space 81920K, 0% used [0x0000000427e00000, 0x0000000427e00000, > 0x000000042ce00000) > concurrent mark-sweep generation total 15958016K, used 5325067K > [0x000000042ce00000, 0x00000007fae00000, 0x00000007fae00000) > concurrent-mark-sweep perm gen total 44188K, used 26343K > [0x00000007fae00000, 0x00000007fd927000, 0x0000000800000000) > } > Total time for which application threads were stopped: 26.7940000 seconds > Total time for which application threads were stopped: 0.0408350 seconds > Total time for which application threads were stopped: 0.0264510 seconds > > > > Thanks for the help, > André Cruz >