Cpu Spike to > 100%. --------------------- Key: CASSANDRA-2054 URL: https://issues.apache.org/jira/browse/CASSANDRA-2054 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 0.7.0 Reporter: Thibaut
I see sudden spikes of cpu usage where cassandra will take up an enormous amount of cpu (uptime load > 1000). My application executes both reads and writes. I tested this with https://hudson.apache.org/hudson/job/Cassandra-0.7/193/artifact/cassandra/build/apache-cassandra-2011-01-24_06-01-26-bin.tar.gz. I disabled JNA, but this didn't help. Jstack won't work anymore when this happens: -bash-4.1# jstack 27699 > /tmp/jstackerror 27699: Unable to open socket file: target process not responding or HotSpot VM not loaded The -F option can be used when the target process is not responding Also, my entire application comes to a halt as long as the node is in this state, as the node is still marked as up, but won't respond (cassandra is taking up all the cpu on the first node) to any requests. /software/cassandra/bin/nodetool -h localhost ring Address Status State Load Owns Token ffffffffffffffff 192.168.0.1 Up Normal 3.48 GB 5.00% 0cc 192.168.0.2 Up Normal 3.48 GB 5.00% 199 192.168.0.3 Up Normal 3.67 GB 5.00% 266 192.168.0.4 Up Normal 2.55 GB 5.00% 333 192.168.0.5 Up Normal 2.58 GB 5.00% 400 192.168.0.6 Up Normal 2.54 GB 5.00% 4cc 192.168.0.7 Up Normal 2.59 GB 5.00% 599 192.168.0.8 Up Normal 2.58 GB 5.00% 666 192.168.0.9 Up Normal 2.33 GB 5.00% 733 192.168.0.10 Down Normal 2.39 GB 5.00% 7ff 192.168.0.11 Up Normal 2.4 GB 5.00% 8cc 192.168.0.12 Up Normal 2.74 GB 5.00% 999 192.168.0.13 Up Normal 3.17 GB 5.00% a66 192.168.0.14 Up Normal 3.25 GB 5.00% b33 192.168.0.15 Up Normal 3.01 GB 5.00% c00 192.168.0.16 Up Normal 2.48 GB 5.00% ccc 192.168.0.17 Up Normal 2.41 GB 5.00% d99 192.168.0.18 Up Normal 2.3 GB 5.00% e66 192.168.0.19 Up Normal 2.27 GB 5.00% f33 192.168.0.20 Up Normal 2.32 GB 5.00% ffffffffffffffff The interesting part is that after a while (seconds or minutes), I have seen cassandra nodes return to a normal state again (without restart). I have also never seen this happen at 2 nodes at the same time in the cluster (the node where it happens differes, but there seems to be scheme for it to happen on the first node most of the times). In the above case, I restarted node 192.168.0.10 and the first node returned to normal state. (I don't know if there is a correlation) I attached the jstack of the node in trouble (as soon as I could access it with jstack, but I suspect this is the jstack when the node was running normal again). The heap usage is still moderate: /software/cassandra/bin/nodetool -h localhost info 0cc Gossip active : true Load : 3.49 GB Generation No : 1295949691 Uptime (seconds) : 42843 Heap Memory (MB) : 1570.58 / 3005.38 I will enable the GC logging tomorrow. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.