Hi y'all,

I am writing to a cluster fairly fast and seeing this odd behavior happen,
seemingly to single nodes at a time. The node starts to take more and more
memory (instance has 48GB memory on G1GC). tpstats shows that
MemtableReclaimMemory Pending starts to grow first, then later
MutationStage builds up as well. By then most of the memory is being
consumed, GC is getting longer, node slows down and everything slows down
unless I kill the node. Also the number of Active MemtableReclaimMemory
threads seems to stay at 1. Also interestingly, neither CPU nor disk
utilization are pegged while this is going on; it's on jbod and there is
plenty of headroom there. (Note that there is a decent number of
compactions going on as well but that is expected on these nodes and this
particular one is catching up from a high volume of writes).

Anyone have any theories on why this would be happening?


$ nodetool tpstats
Pool Name                    Active   Pending      Completed   Blocked  All
time blocked
MutationStage                   192    715481      311327142         0
            0
ReadStage                         7         0        9142871         0
            0
RequestResponseStage              1         0      690823199         0
            0
ReadRepairStage                   0         0        2145627         0
            0
CounterMutationStage              0         0              0         0
            0
HintedHandoff                     0         0            144         0
            0
MiscStage                         0         0              0         0
            0
CompactionExecutor               12        24          41022         0
            0
MemtableReclaimMemory             1       102           4263         0
            0
PendingRangeCalculator            0         0             10         0
            0
GossipStage                       0         0         148329         0
            0
MigrationStage                    0         0              0         0
            0
MemtablePostFlush                 0         0           5233         0
            0
ValidationExecutor                0         0              0         0
            0
Sampler                           0         0              0         0
            0
MemtableFlushWriter               0         0           4270         0
            0
InternalResponseStage             0         0       16322698         0
            0
AntiEntropyStage                  0         0              0         0
            0
CacheCleanupExecutor              0         0              0         0
            0
Native-Transport-Requests        25         0      547935519         0
      2586907

Message type           Dropped
READ                         0
RANGE_SLICE                  0
_TRACE                       0
MUTATION                287057
COUNTER_MUTATION             0
REQUEST_RESPONSE             0
PAGED_RANGE                  0
READ_REPAIR                149

Reply via email to