What are your jvm settings? Your read pattern implies that you may fetch lots of data into memory (reading all skus for a given user), maybe it stressed too much the jvm.
Did you use native paging of Java Driver to avoid loading all columns at a time? And the loading all skus for one user, is it a rare scenario or is it your main use case for this colum family? Le 25 juil. 2014 04:11, "Keith Wright" <kwri...@nanigans.com> a écrit : > One last item to add to this thread: we have consistently experienced > this behavior where over time performance degrades (previously we were > unable to bootstrap nodes to due long GC pauses from existing nodes). I > believe its due to tombstone build up (as I mentioned previously one of the > tables mentioned is showing a droppable tombstone ratio of > 30%). The > sku table is used to hold SKUs that recent users viewed/purchased. When we > write a new SKU we set the TTL to 30 days where the row key is the user id; > our read case is to fetch ALL skus the user has seen within the TTL. Since > the user sees SKUs consistently over time, this can result in a row with > many columns much of which are likely tombstoned (see CASSANDRA-6654 which > I filed for this which shows that C* does not handle this case well). > > I guess I’m just surprised that others aren’t using C* for similar usage > cases and thus having the same issue? > > I am hoping to upgrade to 2.0.9 which has improvements to remove > tombstones. > > From: Keith Wright <kwri...@nanigans.com> > Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org> > Date: Thursday, July 24, 2014 at 4:50 PM > To: "user@cassandra.apache.org" <user@cassandra.apache.org> > Cc: Don Jackson <djack...@nanigans.com> > Subject: Re: Hot, large row > > When a node is showing the high CMS issue, io is actually low likely due > to the fact that none is going on during CMS GC. On a node not showing the > issue, iostat shows disk usage around 50% (these are SSD) and load hovers > around 10 for a dual octo core machine this is fine. > > In addition, nodetool compactionstats does not show that we are falling > behind in compactions. > > So not sure what is going on here. We are running CentOS 6.5 with java > 1.7.0_51. It does seem like things are getting worse and I’m considering > dropping and rebuilding all the tables (as I have the data in Hadoop). > This seems to be a repeated problem for us with Cassandra and now that > Aerospike has an open source version, we are very much considering > switching. > > Thanks again for the help and any insight you might have! > > avg-cpu: %user %nice %system %iowait %steal %idle > > 23.43 12.40 11.40 6.20 0.00 46.57 > > > Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz > avgqu-sz await svctm %util > > sda 0.00 0.00 720.50 0.00 5.51 0.00 15.66 > 1.05 1.54 0.57 41.35 > > sdc 0.00 0.00 5930.50 2100.50 55.42 8.21 16.23 > 2.30 0.29 0.06 51.25 > > sdb 0.00 0.00 6951.50 2052.50 65.82 8.02 16.80 > 4.31 0.48 0.07 59.60 > > > avg-cpu: %user %nice %system %iowait %steal %idle > > 9.48 14.72 5.60 3.67 0.00 66.52 > > > Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz > avgqu-sz await svctm %util > > sda 0.00 0.00 327.00 19.50 2.55 0.08 15.53 > 0.04 0.13 0.12 4.25 > > sdc 0.00 0.00 3974.50 1403.50 36.37 5.48 15.94 > 0.99 0.18 0.08 45.10 > > sdb 0.00 0.00 4357.50 1535.00 40.83 6.00 16.28 > 1.10 0.19 0.08 47.45 > > From: DuyHai Doan <doanduy...@gmail.com> > Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org> > Date: Thursday, July 24, 2014 at 4:44 PM > To: "user@cassandra.apache.org" <user@cassandra.apache.org> > Subject: Re: Hot, large row > > For global_user_event_skus_v2 > > 1. number of SSTables per read is quite huge. Considering you're using > LCS, it means that LCS cannot keep up with write rate and is left behind. > AFAIK LCS is using SizeTieredCompaction at L0 to cope with extreme write > burst. Your high number of SSTables per read is quite fishy here. > > 2. Write latency is widespead up to 73.457 millisecs, meaning that your > node is getting behind for write in some cases. Most of write are still > below 1 millisec but we don't care. What we care here is the large trail of > write latency climbing up to 73 millisecs > > 3. Same remarks for read latency which is worst because the distribution > is even "flatter", worst cases going up to 100 ms. > > If I were you, I'll check for disk I/O first and maybe CPU usage > > > On Thu, Jul 24, 2014 at 10:32 PM, Keith Wright <kwri...@nanigans.com> > wrote: > >> Cfhistograms for the tables I believe are most likely the issue are below >> on the node that most recently presented the issue. Any ideas? Note that >> these tables are LCS and have droppable tombstone ratios of 27% for >> global_user_event_skus_v2 and 2.7% for the other. Table definitions also >> below, you can see that they do use compound keys. >> >> Thanks >> >> CREATE TABLE global_user_event_skus_v2 ( >> >> user_id timeuuid, >> >> app_id int, >> >> type text, >> >> name text, >> >> sku text, >> >> creationtime timestamp, >> >> PRIMARY KEY (user_id, app_id, type, name, sku) >> >> ) WITH >> >> bloom_filter_fp_chance=0.100000 AND >> >> caching='KEYS_ONLY' AND >> >> comment='' AND >> >> dclocal_read_repair_chance=0.000000 AND >> >> gc_grace_seconds=86400 AND >> >> index_interval=128 AND >> >> read_repair_chance=0.100000 AND >> >> replicate_on_write='true' AND >> >> populate_io_cache_on_flush='false' AND >> >> default_time_to_live=2592000 AND >> >> speculative_retry='99.0PERCENTILE' AND >> >> memtable_flush_period_in_ms=0 AND >> >> compaction={'class': 'LeveledCompactionStrategy'} AND >> >> compression={'sstable_compression': 'LZ4Compressor'}; >> >> >> users/global_user_event_skus_v2 histograms >> >> >> SSTables per Read >> >> 1 sstables: 1398122 >> >> 2 sstables: 231100 >> >> 3 sstables: 97334 >> >> 4 sstables: 56496 >> >> 5 sstables: 43046 >> >> 6 sstables: 35471 >> >> 7 sstables: 26888 >> >> 8 sstables: 21102 >> >> 10 sstables: 29522 >> >> 12 sstables: 18314 >> >> 14 sstables: 11368 >> >> 17 sstables: 10283 >> >> 20 sstables: 5766 >> >> 24 sstables: 3554 >> >> 29 sstables: 1681 >> >> 35 sstables: 521 >> >> 42 sstables: 102 >> >> 50 sstables: 1 >> >> >> Write Latency (microseconds) >> >> 2 us: 532 >> >> 3 us: 10728 >> >> 4 us: 49608 >> >> 5 us: 173718 >> >> 6 us: 404131 >> >> 7 us: 697468 >> >> 8 us: 981048 >> >> 10 us: 2643680 >> >> 12 us: 3183727 >> >> 14 us: 3150881 >> >> 17 us: 3746333 >> >> 20 us: 2298481 >> >> 24 us: 1647604 >> >> 29 us: 1070733 >> >> 35 us: 693300 >> >> 42 us: 442120 >> >> 50 us: 288618 >> >> 60 us: 205061 >> >> 72 us: 141194 >> >> 86 us: 98781 >> >> 103 us: 73629 >> >> 124 us: 54900 >> >> 149 us: 40518 >> >> 179 us: 30330 >> >> 215 us: 23953 >> >> 258 us: 18875 >> >> 310 us: 14147 >> >> 372 us: 10509 >> >> 446 us: 7984 >> >> 535 us: 6257 >> >> 642 us: 4903 >> >> 770 us: 4086 >> >> 924 us: 3252 >> >> 1109 us: 2676 >> >> 1331 us: 2129 >> >> 1597 us: 1753 >> >> 1916 us: 1488 >> >> 2299 us: 1237 >> >> 2759 us: 1078 >> >> 3311 us: 866 >> >> 3973 us: 704 >> >> 4768 us: 551 >> >> 5722 us: 553 >> >> 6866 us: 774 >> >> 8239 us: 443 >> >> 9887 us: 303 >> >> 11864 us: 222 >> >> 14237 us: 181 >> >> 17084 us: 146 >> >> 20501 us: 147 >> >> 24601 us: 499 >> >> 29521 us: 842 >> >> 35425 us: 548 >> >> 42510 us: 303 >> >> 51012 us: 560 >> >> 61214 us: 842 >> >> 73457 us: 600 >> >> 88148 us: 283 >> >> 105778 us: 173 >> >> 126934 us: 81 >> >> 152321 us: 18 >> >> 182785 us: 2 >> >> >> Read Latency (microseconds) >> >> 35 us: 2 >> >> 42 us: 72 >> >> 50 us: 737 >> >> 60 us: 3555 >> >> 72 us: 6926 >> >> 86 us: 8373 >> >> 103 us: 15287 >> >> 124 us: 30054 >> >> 149 us: 46164 >> >> 179 us: 67828 >> >> 215 us: 86741 >> >> 258 us: 92178 >> >> 310 us: 82205 >> >> 372 us: 69641 >> >> 446 us: 64710 >> >> 535 us: 58295 >> >> 642 us: 55247 >> >> 770 us: 57918 >> >> 924 us: 56928 >> >> 1109 us: 75762 >> >> 1331 us: 122306 >> >> 1597 us: 141564 >> >> 1916 us: 117301 >> >> 2299 us: 102293 >> >> 2759 us: 98789 >> >> 3311 us: 86923 >> >> 3973 us: 75304 >> >> 4768 us: 64012 >> >> 5722 us: 54973 >> >> 6866 us: 47221 >> >> 8239 us: 41483 >> >> 9887 us: 35015 >> >> 11864 us: 30249 >> >> 14237 us: 23505 >> >> 17084 us: 18364 >> >> 20501 us: 12992 >> >> 24601 us: 8837 >> >> 29521 us: 6497 >> >> 35425 us: 4840 >> >> 42510 us: 3536 >> >> 51012 us: 2471 >> >> 61214 us: 2784 >> >> 73457 us: 3664 >> >> 88148 us: 3219 >> >> 105778 us: 1980 >> >> 126934 us: 1050 >> >> 152321 us: 519 >> >> 182785 us: 184 >> >> 219342 us: 82 >> >> 263210 us: 39 >> >> 315852 us: 22 >> >> 379022 us: 16 >> >> 454826 us: 0 >> >> 545791 us: 0 >> >> 654949 us: 1 >> >> 785939 us: 1 >> >> 943127 us: 0 >> >> 1131752 us: 0 >> >> 1358102 us: 0 >> >> 1629722 us: 3 >> >> 1955666 us: 3 >> >> 2346799 us: 1 >> >> 2816159 us: 0 >> >> 3379391 us: 0 >> >> 4055269 us: 0 >> >> 4866323 us: 0 >> >> 5839588 us: 0 >> >> 7007506 us: 4 >> >> >> Partition Size (bytes) >> >> 35 bytes: 31 >> >> 42 bytes: 0 >> >> 50 bytes: 0 >> >> 60 bytes: 0 >> >> 72 bytes: 0 >> >> 86 bytes: 0 >> >> 103 bytes: 761108 >> >> 124 bytes: 23596164 >> >> 149 bytes: 22963275 >> >> 179 bytes: 3517971 >> >> 215 bytes: 6395335 >> >> 258 bytes: 6197982 >> >> 310 bytes: 4662589 >> >> 372 bytes: 5261934 >> >> 446 bytes: 3032032 >> >> 535 bytes: 3775310 >> >> 642 bytes: 2879935 >> >> 770 bytes: 3002743 >> >> 924 bytes: 2279900 >> >> 1109 bytes: 1976357 >> >> 1331 bytes: 1643875 >> >> 1597 bytes: 1422199 >> >> 1916 bytes: 1142746 >> >> 2299 bytes: 901584 >> >> 2759 bytes: 760173 >> >> 3311 bytes: 596874 >> >> 3973 bytes: 480502 >> >> 4768 bytes: 380185 >> >> 5722 bytes: 303315 >> >> 6866 bytes: 239013 >> >> 8239 bytes: 187337 >> >> 9887 bytes: 147218 >> >> 11864 bytes: 115684 >> >> 14237 bytes: 89718 >> >> 17084 bytes: 69160 >> >> 20501 bytes: 53070 >> >> 24601 bytes: 40496 >> >> 29521 bytes: 30344 >> >> 35425 bytes: 22656 >> >> 42510 bytes: 16419 >> >> 51012 bytes: 11621 >> >> 61214 bytes: 8020 >> >> 73457 bytes: 5561 >> >> 88148 bytes: 3708 >> >> 105778 bytes: 2434 >> >> 126934 bytes: 1561 >> >> 152321 bytes: 1017 >> >> 182785 bytes: 559 >> >> 219342 bytes: 349 >> >> 263210 bytes: 181 >> >> 315852 bytes: 121 >> >> 379022 bytes: 66 >> >> 454826 bytes: 22 >> >> 545791 bytes: 14 >> >> 654949 bytes: 10 >> >> 785939 bytes: 2 >> >> 943127 bytes: 0 >> >> 1131752 bytes: 4 >> >> 1358102 bytes: 2 >> >> 1629722 bytes: 1 >> >> 1955666 bytes: 2 >> >> 2346799 bytes: 0 >> >> 2816159 bytes: 0 >> >> 3379391 bytes: 0 >> >> 4055269 bytes: 0 >> >> 4866323 bytes: 1 >> >> >> Cell Count per Partition >> >> 1 cells: 50599458 >> >> 2 cells: 13975363 >> >> 3 cells: 7927898 >> >> 4 cells: 4981142 >> >> 5 cells: 3081166 >> >> 6 cells: 3034895 >> >> 7 cells: 1896720 >> >> 8 cells: 1508849 >> >> 10 cells: 2418845 >> >> 12 cells: 1631289 >> >> 14 cells: 1179269 >> >> 17 cells: 1301647 >> >> 20 cells: 919744 >> >> 24 cells: 864313 >> >> 29 cells: 735203 >> >> 35 cells: 597749 >> >> 42 cells: 469611 >> >> 50 cells: 367422 >> >> 60 cells: 312697 >> >> 72 cells: 251782 >> >> 86 cells: 197717 >> >> 103 cells: 161244 >> >> 124 cells: 132074 >> >> 149 cells: 103711 >> >> 179 cells: 81760 >> >> 215 cells: 63360 >> >> 258 cells: 48982 >> >> 310 cells: 37932 >> >> 372 cells: 28328 >> >> 446 cells: 21347 >> >> 535 cells: 15373 >> >> 642 cells: 10959 >> >> 770 cells: 7643 >> >> 924 cells: 5346 >> >> 1109 cells: 3613 >> >> 1331 cells: 2291 >> >> 1597 cells: 1558 >> >> 1916 cells: 885 >> >> 2299 cells: 577 >> >> 2759 cells: 327 >> >> 3311 cells: 169 >> >> 3973 cells: 116 >> >> 4768 cells: 57 >> >> 5722 cells: 23 >> >> 6866 cells: 15 >> >> 8239 cells: 9 >> >> 9887 cells: 3 >> >> 11864 cells: 0 >> >> 14237 cells: 5 >> >> 17084 cells: 1 >> >> 20501 cells: 0 >> >> 24601 cells: 2 >> >> 29521 cells: 0 >> >> 35425 cells: 0 >> >> 42510 cells: 0 >> >> 51012 cells: 0 >> >> 61214 cells: 1 >> >> >> CREATE TABLE global_user_events ( >> >> user_id timeuuid, >> >> app_id int, >> >> type text, >> >> name text, >> >> extra_param map<text, text>, >> >> paid boolean, >> >> value float, >> >> PRIMARY KEY (user_id, app_id, type, name) >> >> ) WITH >> >> bloom_filter_fp_chance=0.100000 AND >> >> caching='KEYS_ONLY' AND >> >> comment='' AND >> >> dclocal_read_repair_chance=0.000000 AND >> >> gc_grace_seconds=86400 AND >> >> index_interval=128 AND >> >> read_repair_chance=0.100000 AND >> >> replicate_on_write='true' AND >> >> populate_io_cache_on_flush='false' AND >> >> default_time_to_live=0 AND >> >> speculative_retry='99.0PERCENTILE' AND >> >> memtable_flush_period_in_ms=0 AND >> >> compaction={'class': 'LeveledCompactionStrategy'} AND >> >> compression={'sstable_compression': 'LZ4Compressor'}; >> >> >> users/global_user_events histograms >> >> >> SSTables per Read >> >> 1 sstables: 1024446 >> >> 2 sstables: 632948 >> >> 3 sstables: 230768 >> >> 4 sstables: 112428 >> >> 5 sstables: 82651 >> >> 6 sstables: 61439 >> >> 7 sstables: 47518 >> >> 8 sstables: 38490 >> >> 10 sstables: 58054 >> >> 12 sstables: 41128 >> >> 14 sstables: 29642 >> >> 17 sstables: 31056 >> >> 20 sstables: 20474 >> >> 24 sstables: 17799 >> >> 29 sstables: 13744 >> >> 35 sstables: 9485 >> >> 42 sstables: 5933 >> >> 50 sstables: 3607 >> >> 60 sstables: 1618 >> >> 72 sstables: 263 >> >> 86 sstables: 10 >> >> >> Write Latency (microseconds) >> >> 10 us: 10 >> >> 12 us: 75 >> >> 14 us: 319 >> >> 17 us: 3550 >> >> 20 us: 20617 >> >> 24 us: 105254 >> >> 29 us: 311290 >> >> 35 us: 494404 >> >> 42 us: 516301 >> >> 50 us: 534061 >> >> 60 us: 421438 >> >> 72 us: 314377 >> >> 86 us: 375240 >> >> 103 us: 333402 >> >> 124 us: 816891 >> >> 149 us: 667832 >> >> 179 us: 430659 >> >> 215 us: 336063 >> >> 258 us: 380606 >> >> 310 us: 235592 >> >> 372 us: 68679 >> >> 446 us: 14988 >> >> 535 us: 6169 >> >> 642 us: 3763 >> >> 770 us: 2591 >> >> 924 us: 2270 >> >> 1109 us: 1985 >> >> 1331 us: 1744 >> >> 1597 us: 1564 >> >> 1916 us: 1489 >> >> 2299 us: 1326 >> >> 2759 us: 1114 >> >> 3311 us: 1024 >> >> 3973 us: 984 >> >> 4768 us: 833 >> >> 5722 us: 962 >> >> 6866 us: 1076 >> >> 8239 us: 891 >> >> 9887 us: 611 >> >> 11864 us: 489 >> >> 14237 us: 324 >> >> 17084 us: 306 >> >> 20501 us: 222 >> >> 24601 us: 426 >> >> 29521 us: 791 >> >> 35425 us: 820 >> >> 42510 us: 534 >> >> 51012 us: 530 >> >> 61214 us: 946 >> >> 73457 us: 1122 >> >> 88148 us: 714 >> >> 105778 us: 313 >> >> 126934 us: 127 >> >> 152321 us: 38 >> >> 182785 us: 3 >> >> 219342 us: 2 >> >> 263210 us: 0 >> >> 315852 us: 0 >> >> 379022 us: 0 >> >> 454826 us: 2 >> >> 545791 us: 0 >> >> 654949 us: 3 >> >> 785939 us: 0 >> >> 943127 us: 0 >> >> 1131752 us: 0 >> >> 1358102 us: 0 >> >> 1629722 us: 0 >> >> 1955666 us: 0 >> >> 2346799 us: 1 >> >> 2816159 us: 0 >> >> 3379391 us: 0 >> >> 4055269 us: 0 >> >> 4866323 us: 0 >> >> 5839588 us: 0 >> >> 7007506 us: 1 >> >> >> Read Latency (microseconds) >> >> 42 us: 1 >> >> 50 us: 2 >> >> 60 us: 2 >> >> 72 us: 4 >> >> 86 us: 15 >> >> 103 us: 23 >> >> 124 us: 29 >> >> 149 us: 48 >> >> 179 us: 67 >> >> 215 us: 80 >> >> 258 us: 316 >> >> 310 us: 1812 >> >> 372 us: 4999 >> >> 446 us: 8876 >> >> 535 us: 14586 >> >> 642 us: 24417 >> >> 770 us: 31943 >> >> 924 us: 37453 >> >> 1109 us: 77449 >> >> 1331 us: 139135 >> >> 1597 us: 177538 >> >> 1916 us: 169568 >> >> 2299 us: 177617 >> >> 2759 us: 202552 >> >> 3311 us: 198101 >> >> 3973 us: 179807 >> >> 4768 us: 159149 >> >> 5722 us: 142282 >> >> 6866 us: 126124 >> >> 8239 us: 111529 >> >> 9887 us: 97658 >> >> 11864 us: 84913 >> >> 14237 us: 70905 >> >> 17084 us: 54917 >> >> 20501 us: 40623 >> >> 24601 us: 29545 >> >> 29521 us: 21448 >> >> 35425 us: 15255 >> >> 42510 us: 11222 >> >> 51012 us: 8150 >> >> 61214 us: 7473 >> >> 73457 us: 10060 >> >> 88148 us: 10530 >> >> 105778 us: 7573 >> >> 126934 us: 4331 >> >> 152321 us: 2097 >> >> 182785 us: 836 >> >> 219342 us: 255 >> >> 263210 us: 84 >> >> 315852 us: 28 >> >> 379022 us: 8 >> >> 454826 us: 2 >> >> 545791 us: 0 >> >> 654949 us: 2 >> >> 785939 us: 1 >> >> 943127 us: 0 >> >> 1131752 us: 0 >> >> 1358102 us: 0 >> >> 1629722 us: 2 >> >> 1955666 us: 6 >> >> 2346799 us: 18 >> >> 2816159 us: 0 >> >> 3379391 us: 0 >> >> 4055269 us: 0 >> >> 4866323 us: 0 >> >> 5839588 us: 0 >> >> 7007506 us: 34 >> >> >> Partition Size (bytes) >> >> 35 bytes: 25 >> >> 42 bytes: 0 >> >> 50 bytes: 0 >> >> 60 bytes: 0 >> >> 72 bytes: 21761 >> >> 86 bytes: 694466658 >> >> 103 bytes: 219007195 >> >> 124 bytes: 7034596 >> >> 149 bytes: 110269079 >> >> 179 bytes: 101276811 >> >> 215 bytes: 38518027 >> >> 258 bytes: 103242356 >> >> 310 bytes: 51434914 >> >> 372 bytes: 38048245 >> >> 446 bytes: 42558831 >> >> 535 bytes: 32869176 >> >> 642 bytes: 17188026 >> >> 770 bytes: 14750559 >> >> 924 bytes: 15963256 >> >> 1109 bytes: 16125324 >> >> 1331 bytes: 11462272 >> >> 1597 bytes: 6810452 >> >> 1916 bytes: 5685739 >> >> 2299 bytes: 10232324 >> >> 2759 bytes: 5138522 >> >> 3311 bytes: 5847998 >> >> 3973 bytes: 4065752 >> >> 4768 bytes: 3468263 >> >> 5722 bytes: 2545431 >> >> 6866 bytes: 1701913 >> >> 8239 bytes: 993808 >> >> 9887 bytes: 495282 >> >> 11864 bytes: 208644 >> >> 14237 bytes: 74307 >> >> 17084 bytes: 22286 >> >> 20501 bytes: 5474 >> >> 24601 bytes: 1225 >> >> 29521 bytes: 210 >> >> 35425 bytes: 97 >> >> 42510 bytes: 48 >> >> 51012 bytes: 21 >> >> 61214 bytes: 15 >> >> 73457 bytes: 10 >> >> 88148 bytes: 3 >> >> 105778 bytes: 4 >> >> 126934 bytes: 1 >> >> 152321 bytes: 2 >> >> 182785 bytes: 1 >> >> >> Cell Count per Partition >> >> 1 cells: 914323128 >> >> 2 cells: 196020828 >> >> 3 cells: 165502164 >> >> 4 cells: 71735648 >> >> 5 cells: 54450466 >> >> 6 cells: 28995756 >> >> 7 cells: 9625976 >> >> 8 cells: 10257788 >> >> 10 cells: 24004757 >> >> 12 cells: 13087147 >> >> 14 cells: 13588347 >> >> 17 cells: 11976353 >> >> 20 cells: 6245374 >> >> 24 cells: 5627821 >> >> 29 cells: 9480533 >> >> 35 cells: 6008276 >> >> 42 cells: 5221357 >> >> 50 cells: 4355454 >> >> 60 cells: 3955488 >> >> 72 cells: 2872578 >> >> 86 cells: 1914447 >> >> 103 cells: 1216276 >> >> 124 cells: 652035 >> >> 149 cells: 279565 >> >> 179 cells: 99836 >> >> 215 cells: 29132 >> >> 258 cells: 6682 >> >> 310 cells: 1276 >> >> 372 cells: 258 >> >> 446 cells: 96 >> >> 535 cells: 48 >> >> 642 cells: 25 >> >> 770 cells: 9 >> >> 924 cells: 9 >> >> 1109 cells: 4 >> >> 1331 cells: 3 >> >> 1597 cells: 1 >> >> 1916 cells: 2 >> >> From: Jack Krupansky <j...@basetechnology.com> >> Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org> >> Date: Thursday, July 24, 2014 at 4:07 PM >> >> To: "user@cassandra.apache.org" <user@cassandra.apache.org> >> Subject: Re: Hot, large row >> >> Could it be some “fat columns” (cells with large blob or text values) >> rather than the cell-count per se? IOW, a “big row” rather than a “wide >> row”? >> >> And, could it be a large partition rather than a large row (many rows in >> a single partition)? Are clustering columns being used in the primary key? >> >> -- Jack Krupansky >> >> *From:* DuyHai Doan <doanduy...@gmail.com> >> *Sent:* Thursday, July 24, 2014 3:53 PM >> *To:* user@cassandra.apache.org >> *Subject:* Re: Hot, large row >> >> Your extract of cfhistograms show that there are no particular "wide >> rows". The widest has 61214 cells which is big but not that huge to be >> really a concern. >> >> Turning on trace probabilty only tells give you some "hints" about what >> kind of queries are done, it does not give the exact partition key nor >> other statement values, especially when you are using prepared statements >> ... >> >> >> "I am considering reducing read_request_timeout_in_ms: 5000 in >> cassandra.yaml so that it reduces the impact when this occurs." --> Don't >> do that, you'll only sweep dust under the carpet. Find the real issue and >> fix it instead of changing parameter to hide it. >> >> One solution would be on client side, to activate some logging to show >> the CQL3 statements the application is issuing that may overload the >> server. I know that's better said than done but I don't have any other >> idea for the moment >> >> -------- Shameless self-promotion ------ >> >> To support this kind of live prod debugging & investigation that I add a >> new dynamic query logging feature in Achilles: >> https://github.com/doanduyhai/Achilles/wiki/Statements-Logging-and-Tracing#dynamic-statements-logging >> >> Once you hit the issue, this kind of feature may save your day... >> >> >> >> >> >> >> On Thu, Jul 24, 2014 at 9:22 PM, Keith Wright <kwri...@nanigans.com> >> wrote: >> >>> I can see from cfhistograms that I do have some wide rows (see below). >>> I set trace probability as you suggested but the output doesn’t appear to >>> tell me what row was actually read unless I missed something. I just see >>> executing prepared statement. Any ideas how I can find the row in >>> question? >>> >>> I am considering reducing read_request_timeout_in_ms: 5000 in >>> cassandra.yaml so that it reduces the impact when this occurs. >>> >>> Any help in identifying my issue would be GREATLY appreciated >>> >>> >>> Cell Count per Partition >>> >>> 1 cells: 50449950 >>> >>> 2 cells: 14281828 >>> >>> 3 cells: 8093366 >>> >>> 4 cells: 5029200 >>> >>> 5 cells: 3103023 >>> >>> 6 cells: 3059903 >>> >>> 7 cells: 1903018 >>> >>> 8 cells: 1509297 >>> >>> 10 cells: 2420359 >>> >>> 12 cells: 1624895 >>> >>> 14 cells: 1171678 >>> >>> 17 cells: 1289391 >>> >>> 20 cells: 909777 >>> >>> 24 cells: 852081 >>> >>> 29 cells: 722925 >>> >>> 35 cells: 587067 >>> >>> 42 cells: 459473 >>> >>> 50 cells: 358744 >>> >>> 60 cells: 304146 >>> >>> 72 cells: 244682 >>> >>> 86 cells: 191045 >>> >>> 103 cells: 155337 >>> >>> 124 cells: 127061 >>> >>> 149 cells: 98913 >>> >>> 179 cells: 77454 >>> >>> 215 cells: 59849 >>> >>> 258 cells: 46117 >>> >>> 310 cells: 35321 >>> >>> 372 cells: 26319 >>> >>> 446 cells: 19379 >>> >>> 535 cells: 13783 >>> >>> 642 cells: 9993 >>> >>> 770 cells: 6973 >>> >>> 924 cells: 4713 >>> >>> 1109 cells: 3229 >>> >>> 1331 cells: 2062 >>> >>> 1597 cells: 1338 >>> >>> 1916 cells: 773 >>> >>> 2299 cells: 495 >>> >>> 2759 cells: 268 >>> >>> 3311 cells: 150 >>> >>> 3973 cells: 100 >>> >>> 4768 cells: 42 >>> >>> 5722 cells: 24 >>> >>> 6866 cells: 12 >>> >>> 8239 cells: 9 >>> >>> 9887 cells: 3 >>> >>> 11864 cells: 0 >>> >>> 14237 cells: 5 >>> >>> 17084 cells: 1 >>> >>> 20501 cells: 0 >>> >>> 24601 cells: 2 >>> >>> 29521 cells: 0 >>> >>> 35425 cells: 0 >>> >>> 42510 cells: 0 >>> >>> 51012 cells: 0 >>> >>> 61214 cells: 2 >>> >>> From: DuyHai Doan <doanduy...@gmail.com> >>> Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org> >>> Date: Thursday, July 24, 2014 at 3:01 PM >>> To: "user@cassandra.apache.org" <user@cassandra.apache.org> >>> Subject: Re: Hot, large row >>> >>> "How can I detect wide rows?" --> >>> >>> nodetool cfhistograms <keyspace> <suspected column family> >>> >>> Look at column "Column count" (last column) and identify a line in this >>> column with very high value of "Offset". In a well designed application you >>> should have a gaussian distribution where 80% of your row have a similar >>> number of columns. >>> >>> "Anyone know what debug level I can set so that I can see what reads the >>> hot node is handling? " --> >>> >>> "nodetool settraceprobability <value>", where value is a small number >>> (0.001) on the node where you encounter the issue. Activate the tracing for >>> a while (5 mins) and deactivate it (value = 0). Then look into >>> system_traces tables "events" & "sessions". It may help or not since the >>> tracing is done once every 1000. >>> >>> "Any way to get the server to blacklist these wide rows automatically?" >>> --> No >>> >>> >>> On Thu, Jul 24, 2014 at 8:48 PM, Keith Wright <kwri...@nanigans.com> >>> wrote: >>> >>>> Hi all, >>>> >>>> We are seeing an issue where basically daily one of our nodes spikes >>>> in load and is churning in CMS heap pressure. It appears that reads are >>>> backing up and my guess is that our application is reading a large row >>>> repeatedly. Our write structure can lead itself to wide rows very >>>> infrequently (<0.001%) and we do our best to detect and delete them but >>>> obviously we’re missing a case. Hoping for assistance on the following >>>> questions: >>>> >>>> - How can I detect wide rows? >>>> - Anyone know what debug level I can set so that I can see what >>>> reads the hot node is handling? I’m hoping to see the “bad” row >>>> - Any way to get the server to blacklist these wide rows >>>> automatically? >>>> >>>> We’re using C* 2.0.6 with Vnodes. >>>> >>>> Thanks >>>> >>> >>> >> >> > >