Cfhistograms for the tables I believe are most likely the issue are below on the node that most recently presented the issue. Any ideas? Note that these tables are LCS and have droppable tombstone ratios of 27% for global_user_event_skus_v2 and 2.7% for the other. Table definitions also below, you can see that they do use compound keys.
Thanks CREATE TABLE global_user_event_skus_v2 ( user_id timeuuid, app_id int, type text, name text, sku text, creationtime timestamp, PRIMARY KEY (user_id, app_id, type, name, sku) ) WITH bloom_filter_fp_chance=0.100000 AND caching='KEYS_ONLY' AND comment='' AND dclocal_read_repair_chance=0.000000 AND gc_grace_seconds=86400 AND index_interval=128 AND read_repair_chance=0.100000 AND replicate_on_write='true' AND populate_io_cache_on_flush='false' AND default_time_to_live=2592000 AND speculative_retry='99.0PERCENTILE' AND memtable_flush_period_in_ms=0 AND compaction={'class': 'LeveledCompactionStrategy'} AND compression={'sstable_compression': 'LZ4Compressor'}; users/global_user_event_skus_v2 histograms SSTables per Read 1 sstables: 1398122 2 sstables: 231100 3 sstables: 97334 4 sstables: 56496 5 sstables: 43046 6 sstables: 35471 7 sstables: 26888 8 sstables: 21102 10 sstables: 29522 12 sstables: 18314 14 sstables: 11368 17 sstables: 10283 20 sstables: 5766 24 sstables: 3554 29 sstables: 1681 35 sstables: 521 42 sstables: 102 50 sstables: 1 Write Latency (microseconds) 2 us: 532 3 us: 10728 4 us: 49608 5 us: 173718 6 us: 404131 7 us: 697468 8 us: 981048 10 us: 2643680 12 us: 3183727 14 us: 3150881 17 us: 3746333 20 us: 2298481 24 us: 1647604 29 us: 1070733 35 us: 693300 42 us: 442120 50 us: 288618 60 us: 205061 72 us: 141194 86 us: 98781 103 us: 73629 124 us: 54900 149 us: 40518 179 us: 30330 215 us: 23953 258 us: 18875 310 us: 14147 372 us: 10509 446 us: 7984 535 us: 6257 642 us: 4903 770 us: 4086 924 us: 3252 1109 us: 2676 1331 us: 2129 1597 us: 1753 1916 us: 1488 2299 us: 1237 2759 us: 1078 3311 us: 866 3973 us: 704 4768 us: 551 5722 us: 553 6866 us: 774 8239 us: 443 9887 us: 303 11864 us: 222 14237 us: 181 17084 us: 146 20501 us: 147 24601 us: 499 29521 us: 842 35425 us: 548 42510 us: 303 51012 us: 560 61214 us: 842 73457 us: 600 88148 us: 283 105778 us: 173 126934 us: 81 152321 us: 18 182785 us: 2 Read Latency (microseconds) 35 us: 2 42 us: 72 50 us: 737 60 us: 3555 72 us: 6926 86 us: 8373 103 us: 15287 124 us: 30054 149 us: 46164 179 us: 67828 215 us: 86741 258 us: 92178 310 us: 82205 372 us: 69641 446 us: 64710 535 us: 58295 642 us: 55247 770 us: 57918 924 us: 56928 1109 us: 75762 1331 us: 122306 1597 us: 141564 1916 us: 117301 2299 us: 102293 2759 us: 98789 3311 us: 86923 3973 us: 75304 4768 us: 64012 5722 us: 54973 6866 us: 47221 8239 us: 41483 9887 us: 35015 11864 us: 30249 14237 us: 23505 17084 us: 18364 20501 us: 12992 24601 us: 8837 29521 us: 6497 35425 us: 4840 42510 us: 3536 51012 us: 2471 61214 us: 2784 73457 us: 3664 88148 us: 3219 105778 us: 1980 126934 us: 1050 152321 us: 519 182785 us: 184 219342 us: 82 263210 us: 39 315852 us: 22 379022 us: 16 454826 us: 0 545791 us: 0 654949 us: 1 785939 us: 1 943127 us: 0 1131752 us: 0 1358102 us: 0 1629722 us: 3 1955666 us: 3 2346799 us: 1 2816159 us: 0 3379391 us: 0 4055269 us: 0 4866323 us: 0 5839588 us: 0 7007506 us: 4 Partition Size (bytes) 35 bytes: 31 42 bytes: 0 50 bytes: 0 60 bytes: 0 72 bytes: 0 86 bytes: 0 103 bytes: 761108 124 bytes: 23596164 149 bytes: 22963275 179 bytes: 3517971 215 bytes: 6395335 258 bytes: 6197982 310 bytes: 4662589 372 bytes: 5261934 446 bytes: 3032032 535 bytes: 3775310 642 bytes: 2879935 770 bytes: 3002743 924 bytes: 2279900 1109 bytes: 1976357 1331 bytes: 1643875 1597 bytes: 1422199 1916 bytes: 1142746 2299 bytes: 901584 2759 bytes: 760173 3311 bytes: 596874 3973 bytes: 480502 4768 bytes: 380185 5722 bytes: 303315 6866 bytes: 239013 8239 bytes: 187337 9887 bytes: 147218 11864 bytes: 115684 14237 bytes: 89718 17084 bytes: 69160 20501 bytes: 53070 24601 bytes: 40496 29521 bytes: 30344 35425 bytes: 22656 42510 bytes: 16419 51012 bytes: 11621 61214 bytes: 8020 73457 bytes: 5561 88148 bytes: 3708 105778 bytes: 2434 126934 bytes: 1561 152321 bytes: 1017 182785 bytes: 559 219342 bytes: 349 263210 bytes: 181 315852 bytes: 121 379022 bytes: 66 454826 bytes: 22 545791 bytes: 14 654949 bytes: 10 785939 bytes: 2 943127 bytes: 0 1131752 bytes: 4 1358102 bytes: 2 1629722 bytes: 1 1955666 bytes: 2 2346799 bytes: 0 2816159 bytes: 0 3379391 bytes: 0 4055269 bytes: 0 4866323 bytes: 1 Cell Count per Partition 1 cells: 50599458 2 cells: 13975363 3 cells: 7927898 4 cells: 4981142 5 cells: 3081166 6 cells: 3034895 7 cells: 1896720 8 cells: 1508849 10 cells: 2418845 12 cells: 1631289 14 cells: 1179269 17 cells: 1301647 20 cells: 919744 24 cells: 864313 29 cells: 735203 35 cells: 597749 42 cells: 469611 50 cells: 367422 60 cells: 312697 72 cells: 251782 86 cells: 197717 103 cells: 161244 124 cells: 132074 149 cells: 103711 179 cells: 81760 215 cells: 63360 258 cells: 48982 310 cells: 37932 372 cells: 28328 446 cells: 21347 535 cells: 15373 642 cells: 10959 770 cells: 7643 924 cells: 5346 1109 cells: 3613 1331 cells: 2291 1597 cells: 1558 1916 cells: 885 2299 cells: 577 2759 cells: 327 3311 cells: 169 3973 cells: 116 4768 cells: 57 5722 cells: 23 6866 cells: 15 8239 cells: 9 9887 cells: 3 11864 cells: 0 14237 cells: 5 17084 cells: 1 20501 cells: 0 24601 cells: 2 29521 cells: 0 35425 cells: 0 42510 cells: 0 51012 cells: 0 61214 cells: 1 CREATE TABLE global_user_events ( user_id timeuuid, app_id int, type text, name text, extra_param map<text, text>, paid boolean, value float, PRIMARY KEY (user_id, app_id, type, name) ) WITH bloom_filter_fp_chance=0.100000 AND caching='KEYS_ONLY' AND comment='' AND dclocal_read_repair_chance=0.000000 AND gc_grace_seconds=86400 AND index_interval=128 AND read_repair_chance=0.100000 AND replicate_on_write='true' AND populate_io_cache_on_flush='false' AND default_time_to_live=0 AND speculative_retry='99.0PERCENTILE' AND memtable_flush_period_in_ms=0 AND compaction={'class': 'LeveledCompactionStrategy'} AND compression={'sstable_compression': 'LZ4Compressor'}; users/global_user_events histograms SSTables per Read 1 sstables: 1024446 2 sstables: 632948 3 sstables: 230768 4 sstables: 112428 5 sstables: 82651 6 sstables: 61439 7 sstables: 47518 8 sstables: 38490 10 sstables: 58054 12 sstables: 41128 14 sstables: 29642 17 sstables: 31056 20 sstables: 20474 24 sstables: 17799 29 sstables: 13744 35 sstables: 9485 42 sstables: 5933 50 sstables: 3607 60 sstables: 1618 72 sstables: 263 86 sstables: 10 Write Latency (microseconds) 10 us: 10 12 us: 75 14 us: 319 17 us: 3550 20 us: 20617 24 us: 105254 29 us: 311290 35 us: 494404 42 us: 516301 50 us: 534061 60 us: 421438 72 us: 314377 86 us: 375240 103 us: 333402 124 us: 816891 149 us: 667832 179 us: 430659 215 us: 336063 258 us: 380606 310 us: 235592 372 us: 68679 446 us: 14988 535 us: 6169 642 us: 3763 770 us: 2591 924 us: 2270 1109 us: 1985 1331 us: 1744 1597 us: 1564 1916 us: 1489 2299 us: 1326 2759 us: 1114 3311 us: 1024 3973 us: 984 4768 us: 833 5722 us: 962 6866 us: 1076 8239 us: 891 9887 us: 611 11864 us: 489 14237 us: 324 17084 us: 306 20501 us: 222 24601 us: 426 29521 us: 791 35425 us: 820 42510 us: 534 51012 us: 530 61214 us: 946 73457 us: 1122 88148 us: 714 105778 us: 313 126934 us: 127 152321 us: 38 182785 us: 3 219342 us: 2 263210 us: 0 315852 us: 0 379022 us: 0 454826 us: 2 545791 us: 0 654949 us: 3 785939 us: 0 943127 us: 0 1131752 us: 0 1358102 us: 0 1629722 us: 0 1955666 us: 0 2346799 us: 1 2816159 us: 0 3379391 us: 0 4055269 us: 0 4866323 us: 0 5839588 us: 0 7007506 us: 1 Read Latency (microseconds) 42 us: 1 50 us: 2 60 us: 2 72 us: 4 86 us: 15 103 us: 23 124 us: 29 149 us: 48 179 us: 67 215 us: 80 258 us: 316 310 us: 1812 372 us: 4999 446 us: 8876 535 us: 14586 642 us: 24417 770 us: 31943 924 us: 37453 1109 us: 77449 1331 us: 139135 1597 us: 177538 1916 us: 169568 2299 us: 177617 2759 us: 202552 3311 us: 198101 3973 us: 179807 4768 us: 159149 5722 us: 142282 6866 us: 126124 8239 us: 111529 9887 us: 97658 11864 us: 84913 14237 us: 70905 17084 us: 54917 20501 us: 40623 24601 us: 29545 29521 us: 21448 35425 us: 15255 42510 us: 11222 51012 us: 8150 61214 us: 7473 73457 us: 10060 88148 us: 10530 105778 us: 7573 126934 us: 4331 152321 us: 2097 182785 us: 836 219342 us: 255 263210 us: 84 315852 us: 28 379022 us: 8 454826 us: 2 545791 us: 0 654949 us: 2 785939 us: 1 943127 us: 0 1131752 us: 0 1358102 us: 0 1629722 us: 2 1955666 us: 6 2346799 us: 18 2816159 us: 0 3379391 us: 0 4055269 us: 0 4866323 us: 0 5839588 us: 0 7007506 us: 34 Partition Size (bytes) 35 bytes: 25 42 bytes: 0 50 bytes: 0 60 bytes: 0 72 bytes: 21761 86 bytes: 694466658 103 bytes: 219007195 124 bytes: 7034596 149 bytes: 110269079 179 bytes: 101276811 215 bytes: 38518027 258 bytes: 103242356 310 bytes: 51434914 372 bytes: 38048245 446 bytes: 42558831 535 bytes: 32869176 642 bytes: 17188026 770 bytes: 14750559 924 bytes: 15963256 1109 bytes: 16125324 1331 bytes: 11462272 1597 bytes: 6810452 1916 bytes: 5685739 2299 bytes: 10232324 2759 bytes: 5138522 3311 bytes: 5847998 3973 bytes: 4065752 4768 bytes: 3468263 5722 bytes: 2545431 6866 bytes: 1701913 8239 bytes: 993808 9887 bytes: 495282 11864 bytes: 208644 14237 bytes: 74307 17084 bytes: 22286 20501 bytes: 5474 24601 bytes: 1225 29521 bytes: 210 35425 bytes: 97 42510 bytes: 48 51012 bytes: 21 61214 bytes: 15 73457 bytes: 10 88148 bytes: 3 105778 bytes: 4 126934 bytes: 1 152321 bytes: 2 182785 bytes: 1 Cell Count per Partition 1 cells: 914323128 2 cells: 196020828 3 cells: 165502164 4 cells: 71735648 5 cells: 54450466 6 cells: 28995756 7 cells: 9625976 8 cells: 10257788 10 cells: 24004757 12 cells: 13087147 14 cells: 13588347 17 cells: 11976353 20 cells: 6245374 24 cells: 5627821 29 cells: 9480533 35 cells: 6008276 42 cells: 5221357 50 cells: 4355454 60 cells: 3955488 72 cells: 2872578 86 cells: 1914447 103 cells: 1216276 124 cells: 652035 149 cells: 279565 179 cells: 99836 215 cells: 29132 258 cells: 6682 310 cells: 1276 372 cells: 258 446 cells: 96 535 cells: 48 642 cells: 25 770 cells: 9 924 cells: 9 1109 cells: 4 1331 cells: 3 1597 cells: 1 1916 cells: 2 From: Jack Krupansky <j...@basetechnology.com<mailto:j...@basetechnology.com>> Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>> Date: Thursday, July 24, 2014 at 4:07 PM To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>> Subject: Re: Hot, large row Could it be some “fat columns” (cells with large blob or text values) rather than the cell-count per se? IOW, a “big row” rather than a “wide row”? And, could it be a large partition rather than a large row (many rows in a single partition)? Are clustering columns being used in the primary key? -- Jack Krupansky From: DuyHai Doan<mailto:doanduy...@gmail.com> Sent: Thursday, July 24, 2014 3:53 PM To: user@cassandra.apache.org<mailto:user@cassandra.apache.org> Subject: Re: Hot, large row Your extract of cfhistograms show that there are no particular "wide rows". The widest has 61214 cells which is big but not that huge to be really a concern. Turning on trace probabilty only tells give you some "hints" about what kind of queries are done, it does not give the exact partition key nor other statement values, especially when you are using prepared statements ... "I am considering reducing read_request_timeout_in_ms: 5000 in cassandra.yaml so that it reduces the impact when this occurs." --> Don't do that, you'll only sweep dust under the carpet. Find the real issue and fix it instead of changing parameter to hide it. One solution would be on client side, to activate some logging to show the CQL3 statements the application is issuing that may overload the server. I know that's better said than done but I don't have any other idea for the moment -------- Shameless self-promotion ------ To support this kind of live prod debugging & investigation that I add a new dynamic query logging feature in Achilles: https://github.com/doanduyhai/Achilles/wiki/Statements-Logging-and-Tracing#dynamic-statements-logging Once you hit the issue, this kind of feature may save your day... On Thu, Jul 24, 2014 at 9:22 PM, Keith Wright <kwri...@nanigans.com<mailto:kwri...@nanigans.com>> wrote: I can see from cfhistograms that I do have some wide rows (see below). I set trace probability as you suggested but the output doesn’t appear to tell me what row was actually read unless I missed something. I just see executing prepared statement. Any ideas how I can find the row in question? I am considering reducing read_request_timeout_in_ms: 5000 in cassandra.yaml so that it reduces the impact when this occurs. Any help in identifying my issue would be GREATLY appreciated Cell Count per Partition 1 cells: 50449950 2 cells: 14281828 3 cells: 8093366 4 cells: 5029200 5 cells: 3103023<tel:3103023> 6 cells: 3059903 7 cells: 1903018 8 cells: 1509297 10 cells: 2420359 12 cells: 1624895 14 cells: 1171678 17 cells: 1289391 20 cells: 909777 24 cells: 852081 29 cells: 722925 35 cells: 587067 42 cells: 459473 50 cells: 358744 60 cells: 304146 72 cells: 244682 86 cells: 191045 103 cells: 155337 124 cells: 127061 149 cells: 98913 179 cells: 77454 215 cells: 59849 258 cells: 46117 310 cells: 35321 372 cells: 26319 446 cells: 19379 535 cells: 13783 642 cells: 9993 770 cells: 6973 924 cells: 4713 1109 cells: 3229 1331 cells: 2062 1597 cells: 1338 1916 cells: 773 2299 cells: 495 2759 cells: 268 3311 cells: 150 3973 cells: 100 4768 cells: 42 5722 cells: 24 6866 cells: 12 8239 cells: 9 9887 cells: 3 11864 cells: 0 14237 cells: 5 17084 cells: 1 20501 cells: 0 24601 cells: 2 29521 cells: 0 35425 cells: 0 42510 cells: 0 51012 cells: 0 61214 cells: 2 From: DuyHai Doan <doanduy...@gmail.com<mailto:doanduy...@gmail.com>> Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>> Date: Thursday, July 24, 2014 at 3:01 PM To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>> Subject: Re: Hot, large row "How can I detect wide rows?" --> nodetool cfhistograms <keyspace> <suspected column family> Look at column "Column count" (last column) and identify a line in this column with very high value of "Offset". In a well designed application you should have a gaussian distribution where 80% of your row have a similar number of columns. "Anyone know what debug level I can set so that I can see what reads the hot node is handling? " --> "nodetool settraceprobability <value>", where value is a small number (0.001) on the node where you encounter the issue. Activate the tracing for a while (5 mins) and deactivate it (value = 0). Then look into system_traces tables "events" & "sessions". It may help or not since the tracing is done once every 1000. "Any way to get the server to blacklist these wide rows automatically?" --> No On Thu, Jul 24, 2014 at 8:48 PM, Keith Wright <kwri...@nanigans.com<mailto:kwri...@nanigans.com>> wrote: Hi all, We are seeing an issue where basically daily one of our nodes spikes in load and is churning in CMS heap pressure. It appears that reads are backing up and my guess is that our application is reading a large row repeatedly. Our write structure can lead itself to wide rows very infrequently (<0.001%) and we do our best to detect and delete them but obviously we’re missing a case. Hoping for assistance on the following questions: * How can I detect wide rows? * Anyone know what debug level I can set so that I can see what reads the hot node is handling? I’m hoping to see the “bad” row * Any way to get the server to blacklist these wide rows automatically? We’re using C* 2.0.6 with Vnodes. Thanks