Hi Dan, I'll try to go through all the elements:
seeing this odd behavior happen, seemingly to single nodes at a time Is that one node at the time or always on the same node. Do you consider your data model if fairly, evenly distributed ? The node starts to take more and more memory (instance has 48GB memory on > G1GC) Do you use 48 GB heap size or is that the total amount of memory in the node ? Could we have your JVM settings (GC and heap sizes), also memtable size and type (off heap?) and the amount of available memory ? Note that there is a decent number of compactions going on as well but that > is expected on these nodes and this particular one is catching up from a > high volume of writes > Are the *concurrent_compactors* correctly throttled (about 8 with good machines) and the *compaction_throughput_mb_per_sec* high enough to cope with what is thrown at the node ? Using SSD I often see the latter unthrottled (using 0 value), but I would try small increments first. Also interestingly, neither CPU nor disk utilization are pegged while this > is going on > First thing is making sure your memory management is fine. Having information about the JVM and memory usage globally would help. Then, if you are not fully using the resources you might want to try increasing the number of *concurrent_writes* to a higher value (probably a way higher, given the pending requests, but go safely, incrementally, first on a canary node) and monitor tpstats + resources. Hope this will help Mutation pending going down. My guess is that pending requests are messing with the JVM, but it could be the exact contrary as well. Native-Transport-Requests 25 0 547935519 0 > 2586907 About Native requests being blocked, you can probably mitigate things by increasing the native_transport_max_threads: 128 (try to double it and continue tuning incrementally). Also, an up to date client, using the Native protocol V3 handles a lot better connections / threads from clients. Having an heavy throughput like yours, you might want to give this a try. What is your current client ? What does "netstat -an | grep -e 9042 -e 9160 | grep ESTABLISHED | wc -l" outputs ? This is the number of clients connected to the node. Do you have other significant errors or warning in the logs (other than dropped mutations)? "grep -i -e "ERROR" -e "WARN" /var/log/cassandra/system.log" As a small conclusion I would have an eye on things related to the memory management and also trying to push Cassandra limits by increasing default values as you seems to have resources available, to make sure Cassandra can cope with the high throughput. Pending operations = high memory pressure. Reducing pending stuff somehow will probably get you out off troubles. Hope this first round of ideas will help you. C*heers, ----------------------- Alain Rodriguez - al...@thelastpickle.com France The Last Pickle - Apache Cassandra Consulting http://www.thelastpickle.com 2016-03-02 22:58 GMT+01:00 Dan Kinder <dkin...@turnitin.com>: > Also should note: Cassandra 2.2.5, Centos 6.7 > > On Wed, Mar 2, 2016 at 1:34 PM, Dan Kinder <dkin...@turnitin.com> wrote: > >> Hi y'all, >> >> I am writing to a cluster fairly fast and seeing this odd behavior >> happen, seemingly to single nodes at a time. The node starts to take more >> and more memory (instance has 48GB memory on G1GC). tpstats shows that >> MemtableReclaimMemory Pending starts to grow first, then later >> MutationStage builds up as well. By then most of the memory is being >> consumed, GC is getting longer, node slows down and everything slows down >> unless I kill the node. Also the number of Active MemtableReclaimMemory >> threads seems to stay at 1. Also interestingly, neither CPU nor disk >> utilization are pegged while this is going on; it's on jbod and there is >> plenty of headroom there. (Note that there is a decent number of >> compactions going on as well but that is expected on these nodes and this >> particular one is catching up from a high volume of writes). >> >> Anyone have any theories on why this would be happening? >> >> >> $ nodetool tpstats >> Pool Name Active Pending Completed Blocked >> All time blocked >> MutationStage 192 715481 311327142 0 >> 0 >> ReadStage 7 0 9142871 0 >> 0 >> RequestResponseStage 1 0 690823199 0 >> 0 >> ReadRepairStage 0 0 2145627 0 >> 0 >> CounterMutationStage 0 0 0 0 >> 0 >> HintedHandoff 0 0 144 0 >> 0 >> MiscStage 0 0 0 0 >> 0 >> CompactionExecutor 12 24 41022 0 >> 0 >> MemtableReclaimMemory 1 102 4263 0 >> 0 >> PendingRangeCalculator 0 0 10 0 >> 0 >> GossipStage 0 0 148329 0 >> 0 >> MigrationStage 0 0 0 0 >> 0 >> MemtablePostFlush 0 0 5233 0 >> 0 >> ValidationExecutor 0 0 0 0 >> 0 >> Sampler 0 0 0 0 >> 0 >> MemtableFlushWriter 0 0 4270 0 >> 0 >> InternalResponseStage 0 0 16322698 0 >> 0 >> AntiEntropyStage 0 0 0 0 >> 0 >> CacheCleanupExecutor 0 0 0 0 >> 0 >> Native-Transport-Requests 25 0 547935519 0 >> 2586907 >> >> Message type Dropped >> READ 0 >> RANGE_SLICE 0 >> _TRACE 0 >> MUTATION 287057 >> COUNTER_MUTATION 0 >> REQUEST_RESPONSE 0 >> PAGED_RANGE 0 >> READ_REPAIR 149 >> >> > > > -- > Dan Kinder > Principal Software Engineer > Turnitin – www.turnitin.com > dkin...@turnitin.com >