Is dc1 a simple standby DC? Or you run some operations(e.g. compute for
analysis) on the same? Have you found the root cause of the oom?  Do you
see any specific Cassandra operation (e.g repair) is causing oom?
One tip: try upgrading to 3.11.6 as lots of bugs has been fixed since 3.11.0

On Wed, Feb 26, 2020, 9:53 PM Krish Donald <gotomyp...@gmail.com> wrote:

> Nodes are going down due to Out of Memory and we are using 31GB heap size
> in DC1 , however in DC2 (Which serves the traffic) has 16GB heap .
> Why we had to increase heap in DC1 is because , DC1 nodes were going down
> due Out of Memory issue but DC2 nodes never went down .
>
> We also noticed below kind of messages in system.log
> FailureDetector.java:288 - Not marking nodes down due to local pause of
> 9532654114 > 5000000000
>
>
>
> On Tue, Feb 25, 2020 at 9:43 PM Erick Ramirez <erick.rami...@datastax.com>
> wrote:
>
>> What's the reason for nodes going down? Is it because the cluster is
>> overloaded? Hints will get handed off periodically when nodes come back to
>> life but if they happen to go down again or become unresponsive (for
>> whatever reason), the handoff will be delayed until the next cycle. I think
>> it's every 5 minutes but don't quote me.
>>
>> Hinted MV updates can be problematic so it is a symptom but with limited
>> info, I'm not sure that it's the cause for slow handoffs. Cheers!
>>
>>>

Reply via email to