Hello.

I'm having problems with frequent Full GCs that take a long time, and cause a 
burst of timeouts for the client application. But first, here is my 
configuration:

Cassandra: 1.1.5
Key Cache        : size 209715168 (bytes), capacity 209715168 (bytes), 
1331992571 hits, 1831790912 requests, 0.854 recent hit rate, 14400 save period 
in seconds
Row Cache        : size 104588313 (bytes), capacity 104857600 (bytes), 
585590776 hits, 646992444 requests, 0.880 recent hit rate, 0 save period in 
seconds
flush_largest_memtables_at: 0.8
memtable_total_space_in_mb: 2048

JVM: 
java version "1.6.0_35"
Java(TM) SE Runtime Environment (build 1.6.0_35-b10)
Java HotSpot(TM) 64-Bit Server VM (build 20.10-b01, mixed mode)

JVM Options:
-XX:+UseParNewGC
-XX:+UseConcMarkSweepGC
-XX:+CMSParallelRemarkEnabled
-XX:SurvivorRatio=8
-XX:MaxTenuringThreshold=1
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly
-XX:ParallelCMSThreads=4
-Xms16G
-Xmx16G
-Xmn800M
-Xss196k

6 nodes with 32GB RAM, Intel(R) Xeon(R) CPU E5-2609, 300GB data per node.


Current status:
I had to raise the heap to 16GB due to constant very high heap usage. I've 
since lowered the bf ratio, but still haven't got to lowering the heap size 
because I'm still not sure a 8GB heap can handle the load per node.

Things I plan to do:
- upgrade to Cassandra 1.2. Supposedly it makes better use of the heap.
- add more nodes. Still waiting for the hardware.
- test again with lower heap sizes.
- test JVM 7 (G1?). I've read somewhere that JVM 6 is recommended for Cassandra 
but Cassandra 2.0 requires JVM 7 so...

Besides these, I would also like to double-check my current GC parameters. I'm 
thinking of increasing the NewSize to 1200M, so that less garbage ends up in 
the old generation. But here are the GC logs, maybe someone has a better idea?

2013-09-20T07:32:59.090+0100: 141675.121: [GC 141675.121: [ParNew (2: promotion 
failure size = 9564)  (promotion failed)
Desired survivor size 41943040 bytes, new threshold 1 (max 1)
- age   1:     569496 bytes,     569496 total
: 85346K->67732K(737280K), 0.1206690 secs] 14838953K->14882152K(16695296K), 
0.1210110 secs] [Times: user=0.58 sys=0.01, real=0.12 secs] 
Heap after GC invocations=190322 (full 281):
 par new generation   total 737280K, used 67732K [0x00000003fae00000, 
0x000000042ce00000, 0x000000042ce00000)
  eden space 655360K,   3% used [0x00000003fae00000, 0x00000003fc212690, 
0x0000000422e00000)
  from space 81920K,  57% used [0x0000000427e00000, 0x000000042ac12988, 
0x000000042ce00000)
  to   space 81920K,  79% used [0x0000000422e00000, 0x0000000426d46538, 
0x0000000427e00000)
 concurrent mark-sweep generation total 15958016K, used 14814420K 
[0x000000042ce00000, 0x00000007fae00000, 0x00000007fae00000)
 concurrent-mark-sweep perm gen total 44188K, used 26411K [0x00000007fae00000, 
0x00000007fd927000, 0x0000000800000000)
}
GC locker: Trying a full collection because scavenge failed
{Heap before GC invocations=190322 (full 281):
 par new generation   total 737280K, used 67732K [0x00000003fae00000, 
0x000000042ce00000, 0x000000042ce00000)
  eden space 655360K,   3% used [0x00000003fae00000, 0x00000003fc212690, 
0x0000000422e00000)
  from space 81920K,  57% used [0x0000000427e00000, 0x000000042ac12988, 
0x000000042ce00000)
  to   space 81920K,  79% used [0x0000000422e00000, 0x0000000426d46538, 
0x0000000427e00000)
 concurrent mark-sweep generation total 15958016K, used 14814420K 
[0x000000042ce00000, 0x00000007fae00000, 0x00000007fae00000)
 concurrent-mark-sweep perm gen total 44188K, used 26411K [0x00000007fae00000, 
0x00000007fd927000, 0x0000000800000000)
2013-09-20T07:32:59.212+0100: 141675.242: [Full GC 141675.242: 
[CMS2013-09-20T07:33:06.691+0100: 141682.721: [CMS-concurrent-sweep: 
19.719/29.300 secs] [Times: user=146.19 sys=3.74, real=29.31 secs] 
 (concurrent mode failure): 14814420K->7653329K(15958016K), 35.8052520 secs] 
14882152K->7653329K(16695296K), [CMS Perm : 26411K->26355K(44188K)], 35.8055760 
secs] [Times: user=35.74 sys=0.04, real=35.80 secs] 
Heap after GC invocations=190323 (full 282):
 par new generation   total 737280K, used 0K [0x00000003fae00000, 
0x000000042ce00000, 0x000000042ce00000)
  eden space 655360K,   0% used [0x00000003fae00000, 0x00000003fae00000, 
0x0000000422e00000)
  from space 81920K,   0% used [0x0000000427e00000, 0x0000000427e00000, 
0x000000042ce00000)
  to   space 81920K,   0% used [0x0000000422e00000, 0x0000000422e00000, 
0x0000000427e00000)
 concurrent mark-sweep generation total 15958016K, used 7653329K 
[0x000000042ce00000, 0x00000007fae00000, 0x00000007fae00000)
 concurrent-mark-sweep perm gen total 44188K, used 26355K [0x00000007fae00000, 
0x00000007fd927000, 0x0000000800000000)
}
Total time for which application threads were stopped: 35.9305810 seconds
Total time for which application threads were stopped: 0.0030240 seconds


...


2013-09-20T09:58:41.765+0100: 150417.795: [GC 150417.796: [ParNew
Desired survivor size 41943040 bytes, new threshold 1 (max 1)
- age   1:   83887848 bytes,   83887848 total
: 674422K->81920K(737280K), 0.0680000 secs] 15824090K->15233726K(16695296K), 
0.0685120 secs] [Times: user=0.54 sys=0.00, real=0.07 secs] 
Heap after GC invocations=214529 (full 323):
 par new generation   total 737280K, used 81920K [0x00000003fae00000, 
0x000000042ce00000, 0x000000042ce00000)
  eden space 655360K,   0% used [0x00000003fae00000, 0x00000003fae00000, 
0x0000000422e00000)
  from space 81920K, 100% used [0x0000000427e00000, 0x000000042ce00000, 
0x000000042ce00000)
  to   space 81920K,   0% used [0x0000000422e00000, 0x0000000422e00000, 
0x0000000427e00000)
 concurrent mark-sweep generation total 15958016K, used 15151806K 
[0x000000042ce00000, 0x00000007fae00000, 0x00000007fae00000)
 concurrent-mark-sweep perm gen total 44188K, used 26424K [0x00000007fae00000, 
0x00000007fd927000, 0x0000000800000000)
}
Total time for which application threads were stopped: 0.0721100 seconds
{Heap before GC invocations=214529 (full 323):
 par new generation   total 737280K, used 82214K [0x00000003fae00000, 
0x000000042ce00000, 0x000000042ce00000)
  eden space 655360K,   0% used [0x00000003fae00000, 0x00000003fae49998, 
0x0000000422e00000)
  from space 81920K, 100% used [0x0000000427e00000, 0x000000042ce00000, 
0x000000042ce00000)
  to   space 81920K,   0% used [0x0000000422e00000, 0x0000000422e00000, 
0x0000000427e00000)
 concurrent mark-sweep generation total 15958016K, used 15151806K 
[0x000000042ce00000, 0x00000007fae00000, 0x00000007fae00000)
 concurrent-mark-sweep perm gen total 44188K, used 26424K [0x00000007fae00000, 
0x00000007fd927000, 0x0000000800000000)
2013-09-20T09:58:41.837+0100: 150417.868: [GC 150417.868: [ParNew (0: promotion 
failure size = 1677)  (1: promotion failure size = 2584)  (2: promotion failure 
size = 2584)  (3: promotion failure size = 1723)  (4: promotion failure size = 
1723)  (7: promotion failure size
 = 4766)  (promotion failed)
Desired survivor size 41943040 bytes, new threshold 1 (max 1)
- age   1:      10688 bytes,      10688 total
: 82214K->62920K(737280K), 0.4677070 secs] 15234020K->15298061K(16695296K), 
0.4681720 secs] [Times: user=1.15 sys=0.01, real=0.46 secs] 
Heap after GC invocations=214530 (full 323):
 par new generation   total 737280K, used 62920K [0x00000003fae00000, 
0x000000042ce00000, 0x000000042ce00000)
  eden space 655360K,   0% used [0x00000003fae00000, 0x00000003fae49998, 
0x0000000422e00000)
  from space 81920K,  76% used [0x0000000422e00000, 0x0000000426b28940, 
0x0000000427e00000)
  to   space 81920K, 100% used [0x0000000427e00000, 0x000000042ce00000, 
0x000000042ce00000)
 concurrent mark-sweep generation total 15958016K, used 15235140K 
[0x000000042ce00000, 0x00000007fae00000, 0x00000007fae00000)
 concurrent-mark-sweep perm gen total 44188K, used 26424K [0x00000007fae00000, 
0x00000007fd927000, 0x0000000800000000)
}
GC locker: Trying a full collection because scavenge failed
{Heap before GC invocations=214530 (full 323):
 par new generation   total 737280K, used 62920K [0x00000003fae00000, 
0x000000042ce00000, 0x000000042ce00000)
  eden space 655360K,   0% used [0x00000003fae00000, 0x00000003fae49998, 
0x0000000422e00000)
  from space 81920K,  76% used [0x0000000422e00000, 0x0000000426b28940, 
0x0000000427e00000)
  to   space 81920K, 100% used [0x0000000427e00000, 0x000000042ce00000, 
0x000000042ce00000)
 concurrent mark-sweep generation total 15958016K, used 15235140K 
[0x000000042ce00000, 0x00000007fae00000, 0x00000007fae00000)
 concurrent-mark-sweep perm gen total 44188K, used 26424K [0x00000007fae00000, 
0x00000007fd927000, 0x0000000800000000)
2013-09-20T09:58:42.306+0100: 150418.336: [Full GC 150418.336: 
[CMS2013-09-20T09:58:46.720+0100: 150422.751: [CMS-concurrent-preclean: 
7.057/10.443 secs] [Times: user=45.24 sys=0.32, real=10.44 secs] 
 (concurrent mode failure): 15235140K->5325067K(15958016K), 26.3214360 secs] 
15298061K->5325067K(16695296K), [CMS Perm : 26424K->26343K(44188K)], 26.3218650 
secs] [Times: user=26.29 sys=0.02, real=26.33 secs] 
Heap after GC invocations=214531 (full 324):
 par new generation   total 737280K, used 0K [0x00000003fae00000, 
0x000000042ce00000, 0x000000042ce00000)
  eden space 655360K,   0% used [0x00000003fae00000, 0x00000003fae00000, 
0x0000000422e00000)
  from space 81920K,   0% used [0x0000000422e00000, 0x0000000422e00000, 
0x0000000427e00000)
  to   space 81920K,   0% used [0x0000000427e00000, 0x0000000427e00000, 
0x000000042ce00000)
 concurrent mark-sweep generation total 15958016K, used 5325067K 
[0x000000042ce00000, 0x00000007fae00000, 0x00000007fae00000)
 concurrent-mark-sweep perm gen total 44188K, used 26343K [0x00000007fae00000, 
0x00000007fd927000, 0x0000000800000000)
}
Total time for which application threads were stopped: 26.7940000 seconds
Total time for which application threads were stopped: 0.0408350 seconds
Total time for which application threads were stopped: 0.0264510 seconds



Thanks for the help,
André Cruz

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

Reply via email to