Re: Problems running Cassandra 0.6.1 on large EC2 instances.

Curt Bererton Mon, 17 May 2010 16:02:37 -0700

Here are the current jvm args  and java version:

# Arguments to pass to the JVM
JVM_OPTS=" \
        -ea \
        -Xms128M \
        -Xmx7G \
        -XX:TargetSurvivorRatio=90 \
        -XX:+AggressiveOpts \
        -XX:+UseParNewGC \
        -XX:+UseConcMarkSweepGC \
        -XX:+CMSParallelRemarkEnabled \
        -XX:+HeapDumpOnOutOfMemoryError \
        -XX:SurvivorRatio=128 \
        -XX:MaxTenuringThreshold=0 \
        -Dcom.sun.management.jmxremote.port=8080 \
        -Dcom.sun.management.jmxremote.ssl=false \
        -Dcom.sun.management.jmxremote.authenticate=false"


java -version outputs:
java version "1.6.0_20"
Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed mode)

So pretty much the defaults aside from the 7Gig max heap. CPU is totally
hammered right now, and it is receiving 0 ops/sec from me since I
disconnected it from our application right now until I can figure out what's
going on.

running top on the machine I get:
top - 18:56:32 up 2 days, 20:57,  2 users,  load average: 14.97, 15.24,
15.13
Tasks:  87 total,   5 running,  82 sleeping,   0 stopped,   0 zombie
Cpu(s): 40.1%us, 33.9%sy,  0.0%ni,  0.1%id,  0.0%wa,  0.0%hi,  1.3%si,
24.6%st
Mem:   7872040k total,  3618764k used,  4253276k free,   387536k buffers
Swap:        0k total,        0k used,        0k free,  1655556k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+
COMMAND
 2566 cassandr  25   0 7906m 639m  10m S  150  8.3   5846:35 java


I have jconsole up and running, and jconsole vm Summary tab says:
 - total physical memory: 7,872,040 K
 - Free physical memory: 4,253,036 K
 - Total swap space: 0K
 - Free swap space: 0K
 - Committed virtual memory: 8,096648K

Is there a specific thread I can look at in jconsole that might give me a
clue?  It's weird that it's still at 100% cpu even though it's getting no
traffic from outside right now.  I suppose it might still be talking across
the machines though.

Also, stopping cassandra and starting cassandra on one of the 4 machines
caused the CPU to go back down to almost normal levels.

Here's the ring;
Address       Status     Load
Range                                      Ring

170141183460469231731687303715884105728
10.251.XX.XX Up         2.15 MB
42535295865117307932921825928971026432     |<--|
10.250.XX.XX  Up         2.42 MB
85070591730234615865843651857942052864     |   |
10.250.XX.XX Up         2.47 MB
127605887595351923798765477786913079296    |   |
10.250.XX.XX Up         2.46 MB
170141183460469231731687303715884105728    |-->|

Any thoughts?

Best,
Curt
--
Curt, ZipZapPlay Inc., www.PlayCrafter.com,
http://apps.facebook.com/happyhabitat


On Mon, May 17, 2010 at 3:51 PM, Mark Greene <green...@gmail.com> wrote:

> Can you provide us with the current JVM args? Also, what type of work load
> you are giving the ring (op/s)?
>
>
> On Mon, May 17, 2010 at 6:39 PM, Curt Bererton <c...@zipzapplay.com>wrote:
>
>> Hello Cassandra users+experts,
>>
>> Hopefully someone will be able to point me in the correct direction. We
>> have cassandra 0.6.1 working on our test servers and we *thought* everything
>> was great and ready to move to production. We are currently running a ring
>> of 4 large instance EC2 (http://aws.amazon.com/ec2/instance-types/)
>> servers on production with a replication factor of 3 and a QUORUM
>> consistency level. We ran a test on 1% of our users, and everything was
>> writing to and reading from cassandra great for the first 3 hours. After
>> that point CPU usage spiked to 100% and stayed there, basically on all 4
>> machines at once. This smells to me like a GC issue, and I'm looking into it
>> with jconsole right now. If anyone can help me debug this and get cassandra
>> all the way up and running without CPU spiking I would be forever in their
>> debt.
>>
>> I suspect that anyone else running cassandra on large EC2 instances might
>> just be able to tell me what JVM args they are successfully using in a
>> production environment and if they upgraded to Cassandra 0.6.2 from 0.6.1,
>> and did they go to batched writes due to bug 1014? (
>> https://issues.apache.org/jira/browse/CASSANDRA-1014) That might answer
>> all my questions.
>>
>> Is there anyone on the list who is using large EC2 instances in
>> production? Would you be kind enough to share your JVM arguments and any
>> other tips?
>>
>> Thanks for any help,
>> Curt
>> --
>> Curt, ZipZapPlay Inc., www.PlayCrafter.com,
>> http://apps.facebook.com/happyhabitat
>>
>
>

Re: Problems running Cassandra 0.6.1 on large EC2 instances.

Reply via email to