Hi Abhishek The article with the futex bug description lists the solution, which is to upgrade to a version of RHEL or CentOS that have the specified patch.
What help do you specifically need? If you need help upgrading the OS I would look at the documentation for RHEL or CentOS. Ben On Mon, 14 Nov 2016 at 22:48 Abhishek Gupta <gupta.abhis...@snapdeal.com> wrote: Hi, We are seeing an issue where the system CPU is shooting off to a figure or > 90% when the cluster is subjected to a relatively high write workload i.e 4k wreq/secs. 2016-11-14T13:27:47.900+0530 Process summary process cpu=695.61% application cpu=676.11% (*user=200.63% sys=475.49%) **<== Very High System CPU * other: cpu=19.49% heap allocation rate *403mb*/s [000533] user= 1.43% sys= 6.91% alloc= 2216kb/s - SharedPool-Worker-129 [000274] user= 0.38% sys= 7.78% alloc= 2415kb/s - SharedPool-Worker-34 [000292] user= 1.24% sys= 6.77% alloc= 2196kb/s - SharedPool-Worker-56 [000487] user= 1.24% sys= 6.69% alloc= 2260kb/s - SharedPool-Worker-79 [000488] user= 1.24% sys= 6.56% alloc= 2064kb/s - SharedPool-Worker-78 [000258] user= 1.05% sys= 6.66% alloc= 2250kb/s - SharedPool-Worker-41 On doing strace it was found that the following system call is consuming all the system CPU timeout 10s strace -f -p 5954 -c -q % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- *88.33 1712.798399 16674 102723 22191 futex* 3.98 77.098730 4356 17700 read 3.27 63.474795 394253 161 29 restart_syscall 3.23 62.601530 29768 2103 epoll_wait On searching we found the following bug with the RHEL 6.6, CentOS 6.6 kernel seems to be a probable cause for the issue: https://docs.datastax.com/en/landing_page/doc/landing_page/troubleshooting/cassandra/fetuxWaitBug.html The patch fix mentioned in the doc is also not present in our kernel. sudo rpm -q --changelog kernel-`uname -r` | grep futex | grep ref - [kernel] futex_lock_pi() key refcnt fix (Danny Feng) [566347] {CVE-2010-0623} Can some who has faced and resolved this issue help us here. Thanks, Abhishek -- Ben Bromhead CTO | Instaclustr <https://www.instaclustr.com/> +1 650 284 9692 Managed Cassandra / Spark on AWS, Azure and Softlayer