Nodes becoming unresponsive

Surbhi Gupta Wed, 05 Feb 2020 20:30:56 -0800

Hi,

We have noticed in a Cassandra Cluster , one of the node has 100% cpu
utilization, using top we can see that cassandra process is showing
futex_wait .


We are on CentOS release 6.10 (Final)  .As per below document the futex bug
was on Centos 6.6 .
https://support.datastax.com/hc/en-us/articles/206259833-Nodes-appear-unresponsive-due-to-a-Linux-futex-wait-kernel-bug

Below are the installed patches.

sudo rpm -q --changelog kernel-`uname -r` | grep futex | grep ref

- [kernel] futex: Mention key referencing differences between shared and
private futexes (Larry Woodman) [1167405]

- [kernel] futex: Ensure get_futex_key_refs() always implies a barrier
(Larry Woodman) [1167405]

- [kernel] futex: Fix errors in nested key ref-counting (Denys Vlasenko)
[1094458] {CVE-2014-0205}

- [kernel] futex_lock_pi() key refcnt fix (Danny Feng) [566347]
{CVE-2010-0623}

top - 21:23:34 up 93 days, 10:43,  1 user,  load average: 137.35, 147.74,
148.52

Tasks: 658 total,   1 running, 657 sleeping,   0 stopped,   0 zombie

Cpu(s): 93.9%us,  1.9%sy,  2.0%ni,  2.0%id,  0.0%wa,  0.0%hi,  0.2%si,
0.0%st

Mem:  132236016k total, 129681568k used,  2554448k free,   215888k buffers

Swap:        0k total,        0k used,        0k free, 93679880k cached


   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  WCHAN
COMMAND


  7725 cassandr  20   0  258g  40g  13g S 2302.0 32.4 305169:26 futex_wai
java


 69075 logstash  39  19 10.5g 1.5g  14m S 42.1  1.2   6763:00 futex_wai
java


 30008 root      20   0  465m  55m  11m S 11.5  0.0   0:02.78 poll_sche
TaniumClient


 31785 cassandr  20   0 34.9g  31m  10m S  4.9  0.0   0:00.15 futex_wai
java


  5154 root      20   0 1523m 6260 1300 S  3.0  0.0   1073:05 hrtimer_n
collectd


  1129 root      20   0     0    0    0 S  1.3  0.0 294:55.87 kjournald
jbd2/dm-0-8


 64173 root      20   0 1512m  71m  13m S  1.3  0.1   0:55.69 futex_wai
TaniumClient

Any idea , what else can be looked for high CPU issue?

Thanks
Surbhi

Nodes becoming unresponsive

Reply via email to