Hello, We moved from 0.6.6 to 0.6.13 recently on an 8 nodes cluster and started to see issues with two nodes where memtables are being flushed at a high rate and compaction seems to have fallen off or behind. A huge number of sstables has accumilated as a result of slowed compaction. We are also seeing a high number of dropped reads on only these two nodes.
Here are the log entries for the two nodes: Node 11 2011-11-04_12:20:20.71219 '' WARN [DroppedMessagesLogger] 12:20:20,924 MessagingService.java:479 Dropped 126 READ messages in the last 5000ms 2011-11-04_12:20:20.92854 '' INFO [DroppedMessagesLogger] 12:20:20,924 GCInspector.java:143 Pool Name Active Pending 2011-11-04_12:20:20.92874 '' INFO [DroppedMessagesLogger] 12:20:20,924 GCInspector.java:157 STREAM-STAGE 0 0 2011-11-04_12:20:20.92895 '' INFO [DroppedMessagesLogger] 12:20:20,924 GCInspector.java:157 FILEUTILS-DELETE-POOL 0 0 2011-11-04_12:20:20.92915 '' INFO [FLUSH-WRITER-POOL:1] 12:20:20,924 Memtable.java:166 Completed flushing /var/lib/cassandra/data/current/SoundCloud/Activities-487528-Data.db (3619622 bytes) 2011-11-04_12:20:20.93263 '' INFO [DroppedMessagesLogger] 12:20:20,924 GCInspector.java:157 RESPONSE-STAGE 0 0 2011-11-04_12:20:20.93263 '' INFO [DroppedMessagesLogger] 12:20:20,925 GCInspector.java:157 ROW-READ-STAGE 8 348 2011-11-04_12:20:20.93264 '' INFO [DroppedMessagesLogger] 12:20:20,925 GCInspector.java:157 LB-OPERATIONS 0 0 2011-11-04_12:20:20.93264 '' INFO [DroppedMessagesLogger] 12:20:20,925 GCInspector.java:157 MISCELLANEOUS-POOL 0 0 2011-11-04_12:20:20.93265 '' INFO [DroppedMessagesLogger] 12:20:20,925 GCInspector.java:157 GMFD 0 0 2011-11-04_12:20:20.93265 '' INFO [DroppedMessagesLogger] 12:20:20,925 GCInspector.java:157 CONSISTENCY-MANAGER 0 0 2011-11-04_12:20:20.93265 '' INFO [DroppedMessagesLogger] 12:20:20,926 GCInspector.java:157 LB-TARGET 0 0 2011-11-04_12:20:20.93266 '' INFO [DroppedMessagesLogger] 12:20:20,926 GCInspector.java:157 ROW-MUTATION-STAGE 0 0 2011-11-04_12:20:20.93267 '' INFO [DroppedMessagesLogger] 12:20:20,926 GCInspector.java:157 MESSAGE-STREAMING-POOL 0 0 2011-11-04_12:20:20.93267 '' INFO [DroppedMessagesLogger] 12:20:20,926 GCInspector.java:157 LOAD-BALANCER-STAGE 0 0 2011-11-04_12:20:20.93268 '' INFO [DroppedMessagesLogger] 12:20:20,926 GCInspector.java:157 FLUSH-SORTER-POOL 0 0 2011-11-04_12:20:20.93268 '' INFO [DroppedMessagesLogger] 12:20:20,926 GCInspector.java:157 MEMTABLE-POST-FLUSHER 1 2 2011-11-04_12:20:20.93269 '' INFO [DroppedMessagesLogger] 12:20:20,927 GCInspector.java:157 AE-SERVICE-STAGE 0 0 2011-11-04_12:20:20.93269 '' INFO [DroppedMessagesLogger] 12:20:20,927 GCInspector.java:157 FLUSH-WRITER-POOL 1 2 2011-11-04_12:20:20.93269 '' INFO [DroppedMessagesLogger] 12:20:20,927 GCInspector.java:157 HINTED-HANDOFF-POOL 1 6 2011-11-04_12:20:20.93270 '' INFO [DroppedMessagesLogger] 12:20:20,927 GCInspector.java:161 CompactionManager n/a 4089 2011-11-04_12:20:20.93270 '' INFO [DroppedMessagesLogger] 12:20:20,927 GCInspector.java:165 ColumnFamily Memtable ops,data Row cache size/cap Key cache size/cap 2011-11-04_12:20:20.93271 '' INFO [DroppedMessagesLogger] 12:20:20,927 GCInspector.java:168 system.LocationInfo 0,0 0/0 1/3 2011-11-04_12:20:20.93272 '' INFO [DroppedMessagesLogger] 12:20:20,927 GCInspector.java:168 system.HintsColumnFamily 4,46 0/0 2/6 2011-11-04_12:20:20.93272 '' INFO [DroppedMessagesLogger] 12:20:20,927 GCInspector.java:168 SoundCloud.OwnActivities 28790,539601 0/0 37303/200000 2011-11-04_12:20:20.93273 '' INFO [DroppedMessagesLogger] 12:20:20,928 GCInspector.java:168 SoundCloud.ExclusiveTracks 10230,207529 0/0 3646/200000 2011-11-04_12:20:20.93273 '' INFO [DroppedMessagesLogger] 12:20:20,928 GCInspector.java:168 SoundCloud.Activities 5,90 0/0 200000/200000 2011-11-04_12:20:20.93274 '' INFO [DroppedMessagesLogger] 12:20:20,928 GCInspector.java:168 SoundCloud.IncomingTracks 0,0 0/0 200000/200000 Node 17 2011-11-04_12:21:55.15215 '' WARN [DroppedMessagesLogger] 12:21:55,417 MessagingService.java:479 Dropped 81 READ messages in the last 5000ms 2011-11-04_12:21:55.41788 '' INFO [DroppedMessagesLogger] 12:21:55,417 GCInspector.java:143 Pool Name Active Pending 2011-11-04_12:21:55.41789 '' INFO [DroppedMessagesLogger] 12:21:55,418 GCInspector.java:157 STREAM-STAGE 0 0 2011-11-04_12:21:55.41851 '' INFO [DroppedMessagesLogger] 12:21:55,418 GCInspector.java:157 FILEUTILS-DELETE-POOL 0 0 2011-11-04_12:21:55.41877 '' INFO [DroppedMessagesLogger] 12:21:55,418 GCInspector.java:157 RESPONSE-STAGE 0 0 2011-11-04_12:21:55.42379 '' INFO [DroppedMessagesLogger] 12:21:55,419 GCInspector.java:157 ROW-READ-STAGE 8 211 2011-11-04_12:21:55.42403 '' INFO [DroppedMessagesLogger] 12:21:55,419 GCInspector.java:157 LB-OPERATIONS 0 0 2011-11-04_12:21:55.42427 '' INFO [DroppedMessagesLogger] 12:21:55,419 GCInspector.java:157 MISCELLANEOUS-POOL 0 0 2011-11-04_12:21:55.42448 '' INFO [DroppedMessagesLogger] 12:21:55,419 GCInspector.java:157 GMFD 0 0 2011-11-04_12:21:55.42473 '' INFO [DroppedMessagesLogger] 12:21:55,419 GCInspector.java:157 CONSISTENCY-MANAGER 0 0 2011-11-04_12:21:55.42495 '' INFO [DroppedMessagesLogger] 12:21:55,420 GCInspector.java:157 LB-TARGET 0 0 2011-11-04_12:21:55.42515 '' INFO [DroppedMessagesLogger] 12:21:55,420 GCInspector.java:157 ROW-MUTATION-STAGE 1 1 2011-11-04_12:21:55.42537 '' INFO [DroppedMessagesLogger] 12:21:55,420 GCInspector.java:157 MESSAGE-STREAMING-POOL 0 0 2011-11-04_12:21:55.42561 '' INFO [DroppedMessagesLogger] 12:21:55,420 GCInspector.java:157 LOAD-BALANCER-STAGE 0 0 2011-11-04_12:21:55.42580 '' INFO [DroppedMessagesLogger] 12:21:55,421 GCInspector.java:157 FLUSH-SORTER-POOL 0 0 2011-11-04_12:21:55.42602 '' INFO [DroppedMessagesLogger] 12:21:55,421 GCInspector.java:157 MEMTABLE-POST-FLUSHER 1 3 2011-11-04_12:21:55.42626 '' INFO [DroppedMessagesLogger] 12:21:55,421 GCInspector.java:157 AE-SERVICE-STAGE 0 0 2011-11-04_12:21:55.42649 '' INFO [DroppedMessagesLogger] 12:21:55,421 GCInspector.java:157 FLUSH-WRITER-POOL 1 1 2011-11-04_12:21:55.42670 '' INFO [DroppedMessagesLogger] 12:21:55,422 GCInspector.java:157 HINTED-HANDOFF-POOL 1 8 2011-11-04_12:21:55.42695 '' INFO [DroppedMessagesLogger] 12:21:55,422 GCInspector.java:161 CompactionManager n/a 3423 2011-11-04_12:21:55.42717 '' INFO [DroppedMessagesLogger] 12:21:55,422 GCInspector.java:165 ColumnFamily Memtable ops,data Row cache size/cap Key cache size/cap 2011-11-04_12:21:55.42832 '' INFO [DroppedMessagesLogger] 12:21:55,422 GCInspector.java:168 system.LocationInfo 0,0 0/0 1/2 2011-11-04_12:21:55.42833 '' INFO [DroppedMessagesLogger] 12:21:55,422 GCInspector.java:168 system.HintsColumnFamily 0,0 0/0 1/6 2011-11-04_12:21:55.42833 '' INFO [DroppedMessagesLogger] 12:21:55,422 GCInspector.java:168 SoundCloud.OwnActivities 2545,47090 0/0 41956/200000 2011-11-04_12:21:55.42833 '' INFO [DroppedMessagesLogger] 12:21:55,423 GCInspector.java:168 SoundCloud.ExclusiveTracks 570,11872 0/0 2645/200000 2011-11-04_12:21:55.42834 '' INFO [DroppedMessagesLogger] 12:21:55,423 GCInspector.java:168 SoundCloud.Activities 126085,2171439 0/0 200000/200000 2011-11-04_12:21:55.42872 '' INFO [DroppedMessagesLogger] 12:21:55,423 GCInspector.java:168 SoundCloud.IncomingTracks 95470,1604563 0/0 200000/200000 We have tried to run manual compactions but these don't seem to happen every, like do to the high pending count. I am wondering what the best way to figure out what is blocking on these nodes, in order to get compaction back in that game. I have considered isolating one node via the network to see if it can catch up once there is no load on it. Not sure of the negative side effects of that. Any suggestions on resolving this? Regards, Jake -- Jake Maizel Head of Network Operations Soundcloud Mail & GTalk: j...@soundcloud.com Skype: jakecloud Rosenthaler strasse 13, 101 19, Berlin, DE