After a few days I've also tried disabling Linux kernel huge pages
defragement (echo never > /sys/kernel/mm/transparent_hugepage/defrag) and
turning coalescing off (otc_coalescing_strategy: DISABLED), but either did
do any good. I'm using LCS, there are no big GC pauses, and I have set
"concurrent_compactors: 5" (machines have 16 CPUs), but there are usually
not any compactions running when the load spike comes. "nodetool tpstats"
shows no running thread pools except on the Native-Transport-Requests
(usually 0-4) and perhaps ReadStage (usually 0-1).

The symptoms are the same: after about 12-24 hours increasingly number of
nodes start to show short CPU load spikes and this affects the median read
latencies. I ran a dstat when a load spike was already under way (see
screenshot http://i.imgur.com/B0S5Zki.png), but any other column than the
load itself doesn't show any major change except the system/kernel CPU
usage.

All further ideas how to debug this are greatly appreciated.


On Wed, Jul 20, 2016 at 7:13 PM, Juho Mäkinen <juho.maki...@gmail.com>
wrote:

> I just recently upgraded our cluster to 2.2.7 and after turning the
> cluster under production load the instances started to show high load (as
> shown by uptime) without any apparent reason and I'm not quite sure what
> could be causing it.
>
> We are running on i2.4xlarge, so we have 16 cores, 120GB of ram, four
> 800GB SSDs (set as lvm stripe into one big lvol). Running 3.13.0-87-generic
> on HVM virtualisation. Cluster has 26 TiB of data stored in two tables.
>
> Symptoms:
>  - High load, sometimes up to 30 for a short duration of few minutes, then
> the load drops back to the cluster average: 3-4
>  - Instances might have one compaction running, but might not have any
> compactions.
>  - Each node is serving around 250-300 reads per second and around 200
> writes per second.
>  - Restarting node fixes the problem for around 18-24 hours.
>  - No or very little IO-wait.
>  - top shows that around 3-10 threads are running on high cpu, but that
> alone should not cause a load of 20-30.
>  - Doesn't seem to be GC load: A system starts to show symptoms so that it
> has ran only one CMS sweep. Not like it would do constant stop-the-world
> gc's.
>  - top shows that the C* processes use 100G of RSS memory. I assume that
> this is because cassandra opens all SSTables with mmap() so that they will
> pop up in the RSS count because of this.
>
> What I've done so far:
>  - Rolling restart. Helped for about one day.
>  - Tried doing manual GC to the cluster.
>  - Increased heap from 8 GiB with CMS to 16 GiB with G1GC.
>  - sjk-plus shows bunch of SharedPool workers. Not sure what to make of
> this.
>  - Browsed over
> https://tobert.github.io/pages/als-cassandra-21-tuning-guide.html but
> didn't find any apparent
>
> I know that the general symptom of "system shows high load" is not very
> good and informative, but I don't know how to better describe what's going
> on. I appreciate all ideas what to try and how to debug this further.
>
>  - Garo
>
>

Reply via email to