Hi Guys,

Since the start of our org, cassandra used to be a SPOF, due to recent
priorities we changed our code base so that cassandra won't be SPOF
anymore, and during that process we made a kill switch within the
code(PHP), this kill switch would ensure that no connection is made to the
cassandra for any queries.

During the testing phase of kill switch we have identified a strange
behaviour that CPU and Load Average would go down from 400%(cpu),
14-20(load on a 16 core machine) to 20%(cpu), 2-3(load)

and even if the kill switch is activated only for 30 secs, then cpu would
go down from 400 to 20, and maintain at 20% for atleast 24 hrs before it
starts to increase back to 400 and stay consistent from then. and this is
for all the nodes but not just a few.

Details:
Cassandra Version: 2.2.4
Number of Nodes: 8
AWS Instance Type: c4.4xlarge
Number of Open Files: 30k to 50k (depending on number of auto scaled php
nodes)

Would be grateful for any explanation regarding this strange behaviour

Thanks & Regards
Srinivas Devaki
SRE/SDE at Zomato

Reply via email to