Dimetrio, Look at my last post. I showed you how to turn on all useful GC logging flags. From there we can get information on why GC has long pauses. From the changes you have made it seems you are changing things without knowing the effect. Here are a few things to considenr:
- Having a 9GB NewGen out of a 16GB heap is one recipe for disaster. I am sure if you turn on GC logs, you will see lots of promotion failures. The standard is NewGen to be at max 1/4th of your HEAP to allow for healthy GC promotions; - The Jstat output suggests that the survivor spaces aren't utilized. This is one sign of premature promotion. Consider increasing MaxTenuringThreshold to a value higher that what it is. The higher it is, the slowed things get promoted out of Eden; but we should really examine your GC logs before making this part of the resolution; - If you are going with 16Gb heap, then reduce your NewGen to 1/4th of it; - It seems you have lowered compaction so much that SSTables aren't compacting fast enough; tpstats should tell you something about this it my assumption is true; I also agree with Jonathan about Data Model and access pattern issues. It seems your queries are creating long rows with lots of tombstones. If you are deleting lots of columns from a single row and writing more to it and do a fetch of lots of columns, you end up having to read a large row causing it to stay in heap while being processes and cause long GCs. The GC histograms inside GC logs (after you enabled it), should tell you what is in the heap, either columns from slice queries or columns from compaction (these two are usually the cases based on my experience of tuning GC pauses). Hope this helps On Mon, Jan 27, 2014 at 4:07 AM, Dimetrio <dimet...@flysoft.ru> wrote: > No one advice did't help to me for reduce GC load > > I tried these: > > MAX_HEAP_SIZE from default(8GB) to 16G with HEAP_NEWSIZE from 400M to 9600M > key cache on/off > compacting memory size and other limits > 15 c3.4xlarge nodes (adding 5 nodes to 10 nodes cluster did't help): > and many other > > Reads ~5000 ops/s > Writes ~ 5000 ops/s > max batch is 50 > heavy reads and heavy writes (and heavy deletes) > sometimes i have message: > Read 1001 live and 2691 > Read 12 live and 2796 > > > sudo jstat -gcutil -h15 `sudo cat /var/run/cassandra/cassandra.pid` 250ms 0 > S0 S1 E O P YGC YGCT FGC FGCT GCT > 18.93 0.00 4.52 75.36 59.77 225 30.119 18 28.361 58.480 > 0.00 13.12 3.78 81.09 59.77 226 30.193 18 28.617 58.810 > 0.00 13.12 39.50 81.09 59.78 226 30.193 18 28.617 58.810 > 0.00 13.12 80.70 81.09 59.78 226 30.193 18 28.617 58.810 > 17.21 9.13 0.66 87.38 59.78 228 30.235 18 28.617 58.852 > 0.00 10.96 29.43 87.89 59.78 228 30.328 18 28.617 58.945 > 0.00 10.96 62.67 87.89 59.78 228 30.328 18 28.617 58.945 > 0.00 10.96 96.62 87.89 59.78 228 30.328 18 28.617 58.945 > 0.00 10.69 10.29 94.56 59.78 230 30.462 18 28.617 59.078 > 0.00 10.69 38.08 94.56 59.78 230 30.462 18 28.617 59.078 > 0.00 10.69 71.70 94.56 59.78 230 30.462 18 28.617 59.078 > 15.91 6.24 0.03 99.96 59.78 232 30.506 18 28.617 59.123 > 15.91 8.02 0.03 99.96 59.78 232 30.506 18 28.617 59.123 > 15.91 8.02 0.03 99.96 59.78 232 30.506 18 28.617 59.123 > 15.91 8.02 0.03 99.96 59.78 232 30.506 18 28.617 59.123 > S0 S1 E O P YGC YGCT FGC FGCT GCT > 15.91 8.02 0.03 99.96 59.78 232 30.506 18 28.617 59.123 > 15.91 8.02 0.03 99.96 59.78 232 30.506 18 28.617 59.123 > 15.91 8.02 0.03 99.96 59.78 232 30.506 18 28.617 59.123 > 15.91 8.02 0.03 99.96 59.78 232 30.506 18 28.617 59.123 > 15.91 8.02 0.03 99.96 59.78 232 30.506 18 28.617 59.123 > 15.91 8.02 0.03 99.96 59.78 232 30.506 18 28.617 59.123 > 15.91 8.02 0.03 99.96 59.78 232 30.506 18 28.617 59.123 > 15.91 8.02 0.03 99.96 59.78 232 30.506 18 28.617 59.123 > 15.91 8.02 0.03 99.96 59.78 232 30.506 18 28.617 59.123 > 15.91 8.02 0.03 99.96 59.78 232 30.506 18 28.617 59.123 > 15.91 8.02 0.03 99.96 59.78 232 30.506 18 28.617 59.123 > 15.91 8.02 0.03 99.96 59.78 232 30.506 18 28.617 59.123 > 15.91 8.02 0.03 99.96 59.78 232 30.506 18 28.617 59.123 > 15.91 8.02 0.03 99.96 59.78 232 30.506 18 28.617 59.123 > > > $ nodetool cfhistograms Social home_timeline > Social/home_timeline histograms > Offset SSTables Write Latency Read Latency Partition Size > Cell Count > (micros) (micros) (bytes) > 1 10458 0 0 0 > 26330 > 2 72428 0 0 0 > 0 > 3 339490 11 0 0 > 42398 > 4 661819 156 0 0 > 0 > 5 67186 893 0 0 > 0 > 6 33284 3064 0 0 > 15907 > 7 41287 10542 0 0 > 0 > 8 49085 34689 0 0 > 0 > 10 82941 244013 0 0 > 19068 > 12 49438 523284 0 0 > 17726 > 14 45109 724026 0 0 > 0 > 17 69144 1181873 0 0 > 18888 > 20 52563 1039934 0 0 > 9859 > 24 63041 1007711 0 0 > 16223 > 29 53042 702118 34 0 > 6289 > 35 31725 395098 78 0 > 10434 > 42 25571 212531 81 0 > 13735 > 50 11984 120692 69 0 > 8020 > 60 3985 71142 51 0 > 12874 > 72 4681 44988 52 0 > 10466 > 86 0 31096 50 12336 > 9162 > 103 0 25435 41 0 > 11421 > 124 0 20666 35 0 > 11204 > 149 0 17810 42 4799 > 9953 > 179 0 18568 45 22462 > 12347 > 215 0 20129 89 1492 > 12628 > 258 0 26381 174 6605 > 13687 > 310 0 34731 412 19060 > 12978 > 372 0 58609 577 3848 > 17080 > 446 0 59318 1355 9504 > 13231 > 535 0 16462 2778 10685 > 13791 > 642 0 9985 6555 13931 > 12703 > 770 0 6403 14811 16859 > 11261 > 924 0 4149 34059 12032 > 12311 > 1109 0 3910 77621 8868 > 11829 > 1331 0 3617 154224 12096 > 12886 > 1597 0 3400 234929 10006 > 11738 > 1916 0 3156 264185 9443 > 11546 > 2299 0 2862 199240 10557 > 12011 > 2759 0 2548 145502 10370 > 12039 > 3311 0 1809 125259 10600 > 14861 > 3973 0 1401 83855 10081 > 13866 > 4768 0 1035 69407 10028 > 13341 > 5722 0 795 55982 10594 > 12707 > 6866 0 688 47374 10450 > 11429 > 8239 0 338 37722 10608 > 10324 > 9887 0 277 34395 12180 > 9418 > 11864 0 255 31126 12896 > 8542 > 14237 0 223 25607 12829 > 7796 > 17084 0 178 20661 15812 > 6803 > 20501 0 193 15916 15590 > 6128 > 24601 0 198 12606 14920 > 5290 > 29521 0 186 11137 14068 > 4598 > 35425 0 195 11111 14377 > 3891 > 42510 0 138 9597 14468 > 3439 > 51012 0 123 7032 14298 > 2701 > 61214 0 413 5288 15194 > 2333 > 73457 0 1911 3991 13549 > 1942 > 88148 0 940 3392 13178 > 1595 > 105778 0 427 3045 12835 > 1234 > 126934 0 431 2691 12676 > 1080 > 152321 0 987 2504 15068 > 787 > 182785 0 484 2337 14566 > 600 > 219342 0 595 2052 13664 > 516 > 263210 0 480 1736 13538 > 377 > 315852 0 428 1428 12007 > 316 > 379022 0 367 1093 10802 > 240 > 454826 0 330 816 9819 > 172 > 545791 0 285 555 8931 > 155 > 654949 0 240 339 8153 > 121 > 785939 0 168 201 7212 > 99 > 943127 0 154 103 6477 > 64 > 1131752 0 141 89 5693 > 65 > 1358102 0 135 87 4885 > 51 > 1629722 0 97 66 4210 > 42 > 1955666 0 88 18 3689 > 38 > 2346799 0 78 7 3027 > 18 > 2816159 0 73 21 2528 > 23 > 3379391 0 120 54 2104 > 11 > 4055269 0 70 107 1797 > 7 > 4866323 0 118 37 1429 > 3 > 5839588 0 56 73 1155 > 3 > 7007506 0 31 52 895 > 4 > 8409007 0 36 52 678 > 3 > 10090808 0 29 4 614 > 4 > 12108970 0 34 17 402 > 2 > 14530764 0 31 7 369 > 3 > 17436917 0 22 47 229 > 2 > 20924300 0 20 1 195 > 0 > 25109160 0 34 36 154 > 1 > > > I'm at an impasse. > > > > -- > View this message in context: > http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/GC-eden-filled-instantly-any-size-Dropping-messages-tp7592447.html > Sent from the cassandra-u...@incubator.apache.org mailing list archive at > Nabble.com. > -- Cheers, -Arya