Cpu Spike to > 100%. 
---------------------

                 Key: CASSANDRA-2054
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2054
             Project: Cassandra
          Issue Type: Bug
          Components: Core
    Affects Versions: 0.7.0
            Reporter: Thibaut


I see sudden spikes of cpu usage where cassandra will take up an enormous 
amount of cpu (uptime load > 1000). 

My application executes both reads and writes.

I tested this with 
https://hudson.apache.org/hudson/job/Cassandra-0.7/193/artifact/cassandra/build/apache-cassandra-2011-01-24_06-01-26-bin.tar.gz.

I disabled JNA, but this didn't help.

Jstack won't work anymore when this happens:

-bash-4.1# jstack 27699 > /tmp/jstackerror
27699: Unable to open socket file: target process not responding or HotSpot VM 
not loaded
The -F option can be used when the target process is not responding

Also, my entire application comes to a halt as long as the node is in this 
state, as the node is still marked as up, but won't respond (cassandra is 
taking up all the cpu on the first node) to any requests.

/software/cassandra/bin/nodetool -h localhost ring
Address Status State Load Owns Token
ffffffffffffffff
192.168.0.1 Up Normal 3.48 GB 5.00% 0cc
192.168.0.2 Up Normal 3.48 GB 5.00% 199
192.168.0.3 Up Normal 3.67 GB 5.00% 266
192.168.0.4 Up Normal 2.55 GB 5.00% 333
192.168.0.5 Up Normal 2.58 GB 5.00% 400
192.168.0.6 Up Normal 2.54 GB 5.00% 4cc
192.168.0.7 Up Normal 2.59 GB 5.00% 599
192.168.0.8 Up Normal 2.58 GB 5.00% 666
192.168.0.9 Up Normal 2.33 GB 5.00% 733
192.168.0.10 Down Normal 2.39 GB 5.00% 7ff
192.168.0.11 Up Normal 2.4 GB 5.00% 8cc
192.168.0.12 Up Normal 2.74 GB 5.00% 999
192.168.0.13 Up Normal 3.17 GB 5.00% a66
192.168.0.14 Up Normal 3.25 GB 5.00% b33
192.168.0.15 Up Normal 3.01 GB 5.00% c00
192.168.0.16 Up Normal 2.48 GB 5.00% ccc
192.168.0.17 Up Normal 2.41 GB 5.00% d99
192.168.0.18 Up Normal 2.3 GB 5.00% e66
192.168.0.19 Up Normal 2.27 GB 5.00% f33
192.168.0.20 Up Normal 2.32 GB 5.00% ffffffffffffffff

The interesting part is that after a while (seconds or minutes), I have seen 
cassandra nodes return to a normal state again (without restart). I have also 
never seen this happen at 2 nodes at the same time in the cluster (the node 
where it happens differes, but there seems to be scheme for it to happen on the 
first node most of the times).

In the above case, I restarted node 192.168.0.10 and the first node returned to 
normal state. (I don't know if there is a correlation)

I attached the jstack of the node in trouble (as soon as I could access it with 
jstack, but I suspect this is the jstack when the node was running normal 
again).

The heap usage is still moderate:

/software/cassandra/bin/nodetool -h localhost info
0cc
Gossip active    : true
Load             : 3.49 GB
Generation No    : 1295949691
Uptime (seconds) : 42843
Heap Memory (MB) : 1570.58 / 3005.38


I will enable the GC logging tomorrow.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to