You want to look for full or long GCs in the logs, as well as how much
total time it's spending on GCing as a percentage.  Probably more the
latter, since you're not seeing long pauses with one core pegged and the
rest idle.  G1 handles oversized heaps well, so it's worth bumping to
20-27GB just to see what happens.

If it's not GC, then you're just running out of CPU and need more, or need
to figure out what queries are killing it.

On Thu, Aug 20, 2020 at 10:45 AM Lee Tewksbury <extpi...@gmail.com> wrote:

> Depending on your thread count, you can consider increasing the max native
> transport threads and concurrent reads. But the keys to Cassandra are
> pretty make good data, make good queries, and if you can't keep up, double
> the cluster size. If you're following the documentation on heap size (1/2
> RAM or 20GB, whichever is lower) then I would suggest increasing threads
> but more importantly increasing node count.
>
> On Thu, Aug 20, 2020 at 10:33 AM Krish Donald <gotomyp...@gmail.com>
> wrote:
>
>> Hi,
>>
>> We have a cluster where if reads are increased 2-3 times suddenly then
>> cassandra cpu goes around 100% (We have 48 cpu machines with 128GB RAM) for
>> few nodes and cassandra becomes unresponsive .
>> We are on 3.11.5 and using G1GC with 16GB heap size.
>> When going through the system.logs and gc.log , i see in system.log it is
>> just printing messages like below every 5 secs. I have removed lines for
>> many keyspaces to reduce the size of the text. , and lot of messages are
>> getting printed in gc.log . I feel that may be i need to increase heap size
>> on these nodes but i wanted to understand , how do we determine if heap
>> size should be increased or not. Nodes are not dying due to OOMs . When we
>> have OOMs , we know for sure we need to increase heap size but *what to
>> see in gc.log , system.log and debug.log to determine if we have to
>> increase heap size.*
>>
>> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,368
>> MessagingService.java:1246 - READ messages were dropped in last 5000 ms:
>> 199 internal and 232 cross node. Mean internal dropped latency: 10443 ms
>> and Mean cross-node dropped latency: 10402 ms
>> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,369 StatusLogger.java:47 -
>> Pool Name                    Active   Pending      Completed   Blocked  All
>> Time Blocked
>> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,377 StatusLogger.java:51 -
>> MutationStage                     0         0       80051890         0
>>             0
>>
>> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,378 StatusLogger.java:51 -
>> ViewMutationStage                 0         0              0         0
>>             0
>>
>> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,378 StatusLogger.java:51 -
>> ReadStage                       192      1331      152624049         0
>>             0
>>
>> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,378 StatusLogger.java:51 -
>> RequestResponseStage              0         0      172822890         0
>>             0
>>
>> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,378 StatusLogger.java:51 -
>> ReadRepairStage                   0         0        1545869         0
>>             0
>>
>> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,379 StatusLogger.java:51 -
>> CounterMutationStage              0         0              0         0
>>             0
>>
>> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,379 StatusLogger.java:51 -
>> MiscStage                         0         0              0         0
>>             0
>>
>> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,379 StatusLogger.java:51 -
>> CompactionExecutor                0         0         623536         0
>>             0
>>
>> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,379 StatusLogger.java:51 -
>> MemtableReclaimMemory             0         0           6700         0
>>             0
>>
>> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,380 StatusLogger.java:51 -
>> PendingRangeCalculator            0         0             18         0
>>             0
>>
>> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,380 StatusLogger.java:51 -
>> GossipStage                       0         0        1613366         0
>>             0
>>
>> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,380 StatusLogger.java:51 -
>> SecondaryIndexManagement          0         0              0         0
>>             0
>>
>> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,380 StatusLogger.java:51 -
>> HintsDispatcher                   0         0              5         0
>>             0
>>
>> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,381 StatusLogger.java:51 -
>> MigrationStage                    0         0              1         0
>>             0
>>
>> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,381 StatusLogger.java:51 -
>> MemtablePostFlush                 0         0          14830         0
>>             0
>>
>> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,381 StatusLogger.java:51 -
>> PerDiskMemtableFlushWriter_0         0         0           6700         0
>>               0
>>
>> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,381 StatusLogger.java:51 -
>> ValidationExecutor                0         0              0         0
>>             0
>>
>> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,382 StatusLogger.java:51 -
>> Sampler                           0         0              0         0
>>             0
>>
>> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,382 StatusLogger.java:51 -
>> MemtableFlushWriter               0         0           6700         0
>>             0
>>
>> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,382 StatusLogger.java:51 -
>> InternalResponseStage             0         0          33229         0
>>             0
>>
>> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,383 StatusLogger.java:51 -
>> AntiEntropyStage                  0         0              0         0
>>             0
>>
>> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,383 StatusLogger.java:51 -
>> CacheCleanupExecutor              0         0              0         0
>>             0
>>
>> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,383 StatusLogger.java:51 -
>> Native-Transport-Requests       661         0       84577742         0
>>             0
>>
>> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,383 StatusLogger.java:61 -
>> CompactionManager                 0         0
>> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,384 StatusLogger.java:73 -
>> MessagingService                n/a       0/0
>> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,384 StatusLogger.java:83 -
>> Cache Type                     Size                 Capacity
>> KeysToSave
>> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,384 StatusLogger.java:85 -
>> KeyCache                  104857576                104857600
>>        all
>> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,384 StatusLogger.java:91 -
>> RowCache                          0                        0
>>        all
>> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,384 StatusLogger.java:98 -
>> Table                       Memtable ops,data
>> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,429 StatusLogger.java:101 -
>> system_distributed.parent_repair_history                 0,0
>> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,429 StatusLogger.java:101 -
>> system_distributed.repair_history                 0,0
>> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,430 StatusLogger.java:101 -
>> system_distributed.view_build_status                 0,0
>> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,430 StatusLogger.java:101 -
>> system.compaction_history             12,3327
>> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,430 StatusLogger.java:101 -
>> system.schema_aggregates                  0,0
>> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,430 StatusLogger.java:101 -
>> system.schema_triggers                    0,0
>>
>> Thanks
>>
>

Reply via email to