Hello,

We moved from 0.6.6 to 0.6.13 recently on an 8 nodes cluster and started to
see issues with two nodes where memtables are being flushed at a high rate
and compaction seems to have fallen off or behind.  A huge number of
sstables has accumilated as a result of slowed compaction.  We are also
seeing a high number of dropped reads on only these two nodes.

Here are the log entries for the two nodes:

Node 11
2011-11-04_12:20:20.71219 '' WARN [DroppedMessagesLogger] 12:20:20,924
MessagingService.java:479 Dropped 126 READ messages in the last 5000ms
2011-11-04_12:20:20.92854 '' INFO [DroppedMessagesLogger] 12:20:20,924
GCInspector.java:143 Pool Name                    Active   Pending
2011-11-04_12:20:20.92874 '' INFO [DroppedMessagesLogger] 12:20:20,924
GCInspector.java:157 STREAM-STAGE                      0         0
2011-11-04_12:20:20.92895 '' INFO [DroppedMessagesLogger] 12:20:20,924
GCInspector.java:157 FILEUTILS-DELETE-POOL             0         0
2011-11-04_12:20:20.92915 '' INFO [FLUSH-WRITER-POOL:1] 12:20:20,924
Memtable.java:166 Completed flushing
/var/lib/cassandra/data/current/SoundCloud/Activities-487528-Data.db
(3619622 bytes)
2011-11-04_12:20:20.93263 '' INFO [DroppedMessagesLogger] 12:20:20,924
GCInspector.java:157 RESPONSE-STAGE                    0         0
2011-11-04_12:20:20.93263 '' INFO [DroppedMessagesLogger] 12:20:20,925
GCInspector.java:157 ROW-READ-STAGE                    8       348
2011-11-04_12:20:20.93264 '' INFO [DroppedMessagesLogger] 12:20:20,925
GCInspector.java:157 LB-OPERATIONS                     0         0
2011-11-04_12:20:20.93264 '' INFO [DroppedMessagesLogger] 12:20:20,925
GCInspector.java:157 MISCELLANEOUS-POOL                0         0
2011-11-04_12:20:20.93265 '' INFO [DroppedMessagesLogger] 12:20:20,925
GCInspector.java:157 GMFD                              0         0
2011-11-04_12:20:20.93265 '' INFO [DroppedMessagesLogger] 12:20:20,925
GCInspector.java:157 CONSISTENCY-MANAGER               0         0
2011-11-04_12:20:20.93265 '' INFO [DroppedMessagesLogger] 12:20:20,926
GCInspector.java:157 LB-TARGET                         0         0
2011-11-04_12:20:20.93266 '' INFO [DroppedMessagesLogger] 12:20:20,926
GCInspector.java:157 ROW-MUTATION-STAGE                0         0
2011-11-04_12:20:20.93267 '' INFO [DroppedMessagesLogger] 12:20:20,926
GCInspector.java:157 MESSAGE-STREAMING-POOL            0         0
2011-11-04_12:20:20.93267 '' INFO [DroppedMessagesLogger] 12:20:20,926
GCInspector.java:157 LOAD-BALANCER-STAGE               0         0
2011-11-04_12:20:20.93268 '' INFO [DroppedMessagesLogger] 12:20:20,926
GCInspector.java:157 FLUSH-SORTER-POOL                 0         0
2011-11-04_12:20:20.93268 '' INFO [DroppedMessagesLogger] 12:20:20,926
GCInspector.java:157 MEMTABLE-POST-FLUSHER             1         2
2011-11-04_12:20:20.93269 '' INFO [DroppedMessagesLogger] 12:20:20,927
GCInspector.java:157 AE-SERVICE-STAGE                  0         0
2011-11-04_12:20:20.93269 '' INFO [DroppedMessagesLogger] 12:20:20,927
GCInspector.java:157 FLUSH-WRITER-POOL                 1         2
2011-11-04_12:20:20.93269 '' INFO [DroppedMessagesLogger] 12:20:20,927
GCInspector.java:157 HINTED-HANDOFF-POOL               1         6
2011-11-04_12:20:20.93270 '' INFO [DroppedMessagesLogger] 12:20:20,927
GCInspector.java:161 CompactionManager               n/a      4089
2011-11-04_12:20:20.93270 '' INFO [DroppedMessagesLogger] 12:20:20,927
GCInspector.java:165 ColumnFamily                Memtable ops,data  Row
cache size/cap  Key cache size/cap
2011-11-04_12:20:20.93271 '' INFO [DroppedMessagesLogger] 12:20:20,927
GCInspector.java:168 system.LocationInfo                       0,0
        0/0                 1/3
2011-11-04_12:20:20.93272 '' INFO [DroppedMessagesLogger] 12:20:20,927
GCInspector.java:168 system.HintsColumnFamily                 4,46
        0/0                 2/6
2011-11-04_12:20:20.93272 '' INFO [DroppedMessagesLogger] 12:20:20,927
GCInspector.java:168 SoundCloud.OwnActivities         28790,539601
        0/0        37303/200000
2011-11-04_12:20:20.93273 '' INFO [DroppedMessagesLogger] 12:20:20,928
GCInspector.java:168 SoundCloud.ExclusiveTracks        10230,207529
        0/0         3646/200000
2011-11-04_12:20:20.93273 '' INFO [DroppedMessagesLogger] 12:20:20,928
GCInspector.java:168 SoundCloud.Activities                    5,90
        0/0       200000/200000
2011-11-04_12:20:20.93274 '' INFO [DroppedMessagesLogger] 12:20:20,928
GCInspector.java:168 SoundCloud.IncomingTracks                 0,0
        0/0       200000/200000

Node 17
2011-11-04_12:21:55.15215 '' WARN [DroppedMessagesLogger] 12:21:55,417
MessagingService.java:479 Dropped 81 READ messages in the last 5000ms
2011-11-04_12:21:55.41788 '' INFO [DroppedMessagesLogger] 12:21:55,417
GCInspector.java:143 Pool Name                    Active   Pending
2011-11-04_12:21:55.41789 '' INFO [DroppedMessagesLogger] 12:21:55,418
GCInspector.java:157 STREAM-STAGE                      0         0
2011-11-04_12:21:55.41851 '' INFO [DroppedMessagesLogger] 12:21:55,418
GCInspector.java:157 FILEUTILS-DELETE-POOL             0         0
2011-11-04_12:21:55.41877 '' INFO [DroppedMessagesLogger] 12:21:55,418
GCInspector.java:157 RESPONSE-STAGE                    0         0
2011-11-04_12:21:55.42379 '' INFO [DroppedMessagesLogger] 12:21:55,419
GCInspector.java:157 ROW-READ-STAGE                    8       211
2011-11-04_12:21:55.42403 '' INFO [DroppedMessagesLogger] 12:21:55,419
GCInspector.java:157 LB-OPERATIONS                     0         0
2011-11-04_12:21:55.42427 '' INFO [DroppedMessagesLogger] 12:21:55,419
GCInspector.java:157 MISCELLANEOUS-POOL                0         0
2011-11-04_12:21:55.42448 '' INFO [DroppedMessagesLogger] 12:21:55,419
GCInspector.java:157 GMFD                              0         0
2011-11-04_12:21:55.42473 '' INFO [DroppedMessagesLogger] 12:21:55,419
GCInspector.java:157 CONSISTENCY-MANAGER               0         0
2011-11-04_12:21:55.42495 '' INFO [DroppedMessagesLogger] 12:21:55,420
GCInspector.java:157 LB-TARGET                         0         0
2011-11-04_12:21:55.42515 '' INFO [DroppedMessagesLogger] 12:21:55,420
GCInspector.java:157 ROW-MUTATION-STAGE                1         1
2011-11-04_12:21:55.42537 '' INFO [DroppedMessagesLogger] 12:21:55,420
GCInspector.java:157 MESSAGE-STREAMING-POOL            0         0
2011-11-04_12:21:55.42561 '' INFO [DroppedMessagesLogger] 12:21:55,420
GCInspector.java:157 LOAD-BALANCER-STAGE               0         0
2011-11-04_12:21:55.42580 '' INFO [DroppedMessagesLogger] 12:21:55,421
GCInspector.java:157 FLUSH-SORTER-POOL                 0         0
2011-11-04_12:21:55.42602 '' INFO [DroppedMessagesLogger] 12:21:55,421
GCInspector.java:157 MEMTABLE-POST-FLUSHER             1         3
2011-11-04_12:21:55.42626 '' INFO [DroppedMessagesLogger] 12:21:55,421
GCInspector.java:157 AE-SERVICE-STAGE                  0         0
2011-11-04_12:21:55.42649 '' INFO [DroppedMessagesLogger] 12:21:55,421
GCInspector.java:157 FLUSH-WRITER-POOL                 1         1
2011-11-04_12:21:55.42670 '' INFO [DroppedMessagesLogger] 12:21:55,422
GCInspector.java:157 HINTED-HANDOFF-POOL               1         8
2011-11-04_12:21:55.42695 '' INFO [DroppedMessagesLogger] 12:21:55,422
GCInspector.java:161 CompactionManager               n/a      3423
2011-11-04_12:21:55.42717 '' INFO [DroppedMessagesLogger] 12:21:55,422
GCInspector.java:165 ColumnFamily                Memtable ops,data  Row
cache size/cap  Key cache size/cap
2011-11-04_12:21:55.42832 '' INFO [DroppedMessagesLogger] 12:21:55,422
GCInspector.java:168 system.LocationInfo                       0,0
        0/0                 1/2
2011-11-04_12:21:55.42833 '' INFO [DroppedMessagesLogger] 12:21:55,422
GCInspector.java:168 system.HintsColumnFamily                  0,0
        0/0                 1/6
2011-11-04_12:21:55.42833 '' INFO [DroppedMessagesLogger] 12:21:55,422
GCInspector.java:168 SoundCloud.OwnActivities           2545,47090
        0/0        41956/200000
2011-11-04_12:21:55.42833 '' INFO [DroppedMessagesLogger] 12:21:55,423
GCInspector.java:168 SoundCloud.ExclusiveTracks           570,11872
        0/0         2645/200000
2011-11-04_12:21:55.42834 '' INFO [DroppedMessagesLogger] 12:21:55,423
GCInspector.java:168 SoundCloud.Activities          126085,2171439
        0/0       200000/200000
2011-11-04_12:21:55.42872 '' INFO [DroppedMessagesLogger] 12:21:55,423
GCInspector.java:168 SoundCloud.IncomingTracks       95470,1604563
        0/0       200000/200000

We have tried to run manual compactions but these don't seem to happen
every, like do to the high pending count.

I am wondering what the best way to figure out what is blocking on these
nodes, in order to get compaction back in that game.

I have considered isolating one node via the network to see if it can catch
up once there is no load on it.  Not sure of the negative side effects of
that.

Any suggestions on resolving this?

Regards,

Jake

-- 
Jake Maizel
Head of Network Operations
Soundcloud

Mail & GTalk: j...@soundcloud.com
Skype: jakecloud

Rosenthaler strasse 13, 101 19, Berlin, DE

Reply via email to