Hi ,

I'll recommend tuning you heap size further( preferably lower) as large
Heap size can lead to Large Garbage collection pauses also known as also
known as a stop-the-world event. A pause occurs when a region of memory is
full and the JVM needs to make space to continue. During a pause all
operations are suspended. Because a pause affects networking, the node can
appear as down to other nodes in the cluster. Additionally, any Select and
Insert statements will wait, which increases read and write latencies.

Any pause of more than a second, or multiple pauses within a second that
add to a large fraction of that second, should be avoided. The basic cause
of the problem is the rate of data stored in memory outpaces the rate at
which data can be removed

MUTATION : If a write message is processed after its timeout
(write_request_timeout_in_ms) it either sent a failure to the client or it
met its requested consistency level and will relay on hinted handoff and
read repairs to do the mutation if it succeeded.

Another possible cause of the Issue could be you HDDs as that could too be
a bottleneck.

*MAX_HEAP_SIZE*
The recommended maximum heap size depends on which GC is used:
Hardware setupRecommended MAX_HEAP_SIZE
Older computers Typically 8 GB.
CMS for newer computers (8+ cores) with up to 256 GB RAM No more 14 GB.


Thanks,
Hitesh dua
hiteshd...@gmail.com

On Wed, Apr 18, 2018 at 10:07 PM, shalom sagges <shalomsag...@gmail.com>
wrote:

> Hi All,
>
> I have a 44 node cluster (22 nodes on each DC).
> Each node has 24 cores and 130 GB RAM, 3 TB HDDs.
> Version 2.0.14 (soon to be upgraded)
> ~10K writes per second per node.
> Heap size: 8 GB max, 2.4 GB newgen
>
> I deployed Reaper and GC started to increase rapidly. I'm not sure if it's
> because there was a lot of inconsistency in the data, but I decided to
> increase the heap to 16 GB and new gen to 6 GB. I increased the max tenure
> from 1 to 5.
>
> I tested on a canary node and everything was fine but when I changed the
> entire DC, I suddenly saw a lot of dropped mutations in the logs on most of
> the nodes. (Reaper was not running on the cluster yet but a manual repair
> was running).
>
> Can the heap increment cause lots of dropped mutations?
> When is a mutation considered as dropped? Is it during flush? Is it during
> the write to the commit log or memtable?
>
> Thanks!
>
>
>
>

Reply via email to