Re: Problems running Cassandra 0.6.1 on large EC2 instances.

Mark Greene Mon, 17 May 2010 17:28:33 -0700

Since you only have 7.5GB of memory, it's a really bad idea to set your heap
space to a max of 7GB. Remember, the java process heap will be larger than
what Xmx is allowed to grow to. If you reach this level, you can
start swapping which is very very bad. As Brandon pointed out, you haven't
exhausted your physically memory yet but you still want to lower Xmx to
something like 5 maybe 6 GB.


On Mon, May 17, 2010 at 7:02 PM, Curt Bererton <c...@zipzapplay.com> wrote:

> Here are the current jvm args  and java version:
>
> # Arguments to pass to the JVM
> JVM_OPTS=" \
>         -ea \
>         -Xms128M \
>         -Xmx7G \
>         -XX:TargetSurvivorRatio=90 \
>         -XX:+AggressiveOpts \
>         -XX:+UseParNewGC \
>         -XX:+UseConcMarkSweepGC \
>         -XX:+CMSParallelRemarkEnabled \
>         -XX:+HeapDumpOnOutOfMemoryError \
>         -XX:SurvivorRatio=128 \
>         -XX:MaxTenuringThreshold=0 \
>         -Dcom.sun.management.jmxremote.port=8080 \
>         -Dcom.sun.management.jmxremote.ssl=false \
>         -Dcom.sun.management.jmxremote.authenticate=false"
>
> java -version outputs:
> java version "1.6.0_20"
> Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
> Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed mode)
>
> So pretty much the defaults aside from the 7Gig max heap. CPU is totally
> hammered right now, and it is receiving 0 ops/sec from me since I
> disconnected it from our application right now until I can figure out what's
> going on.
>
> running top on the machine I get:
> top - 18:56:32 up 2 days, 20:57,  2 users,  load average: 14.97, 15.24,
> 15.13
> Tasks:  87 total,   5 running,  82 sleeping,   0 stopped,   0 zombie
> Cpu(s): 40.1%us, 33.9%sy,  0.0%ni,  0.1%id,  0.0%wa,  0.0%hi,  1.3%si,
> 24.6%st
> Mem:   7872040k total,  3618764k used,  4253276k free,   387536k buffers
> Swap:        0k total,        0k used,        0k free,  1655556k cached
>
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+
> COMMAND
>  2566 cassandr  25   0 7906m 639m  10m S  150  8.3   5846:35 java
>
>
> I have jconsole up and running, and jconsole vm Summary tab says:
>  - total physical memory: 7,872,040 K
>  - Free physical memory: 4,253,036 K
>  - Total swap space: 0K
>  - Free swap space: 0K
>  - Committed virtual memory: 8,096648K
>
> Is there a specific thread I can look at in jconsole that might give me a
> clue?  It's weird that it's still at 100% cpu even though it's getting no
> traffic from outside right now.  I suppose it might still be talking across
> the machines though.
>
> Also, stopping cassandra and starting cassandra on one of the 4 machines
> caused the CPU to go back down to almost normal levels.
>
> Here's the ring;
>
> Address       Status     Load
> Range                                      Ring
>
> 170141183460469231731687303715884105728
> 10.251.XX.XX Up         2.15 MB
> 42535295865117307932921825928971026432     |<--|
> 10.250.XX.XX  Up         2.42 MB
> 85070591730234615865843651857942052864     |   |
> 10.250.XX.XX Up         2.47 MB
> 127605887595351923798765477786913079296    |   |
> 10.250.XX.XX Up         2.46 MB
> 170141183460469231731687303715884105728    |-->|
>
> Any thoughts?
>
> Best,
>
> Curt
> --
> Curt, ZipZapPlay Inc., www.PlayCrafter.com,
> http://apps.facebook.com/happyhabitat
>
>
> On Mon, May 17, 2010 at 3:51 PM, Mark Greene <green...@gmail.com> wrote:
>
>> Can you provide us with the current JVM args? Also, what type of work load
>> you are giving the ring (op/s)?
>>
>>
>> On Mon, May 17, 2010 at 6:39 PM, Curt Bererton <c...@zipzapplay.com>wrote:
>>
>>> Hello Cassandra users+experts,
>>>
>>> Hopefully someone will be able to point me in the correct direction. We
>>> have cassandra 0.6.1 working on our test servers and we *thought* everything
>>> was great and ready to move to production. We are currently running a ring
>>> of 4 large instance EC2 (http://aws.amazon.com/ec2/instance-types/)
>>> servers on production with a replication factor of 3 and a QUORUM
>>> consistency level. We ran a test on 1% of our users, and everything was
>>> writing to and reading from cassandra great for the first 3 hours. After
>>> that point CPU usage spiked to 100% and stayed there, basically on all 4
>>> machines at once. This smells to me like a GC issue, and I'm looking into it
>>> with jconsole right now. If anyone can help me debug this and get cassandra
>>> all the way up and running without CPU spiking I would be forever in their
>>> debt.
>>>
>>> I suspect that anyone else running cassandra on large EC2 instances might
>>> just be able to tell me what JVM args they are successfully using in a
>>> production environment and if they upgraded to Cassandra 0.6.2 from 0.6.1,
>>> and did they go to batched writes due to bug 1014? (
>>> https://issues.apache.org/jira/browse/CASSANDRA-1014) That might answer
>>> all my questions.
>>>
>>> Is there anyone on the list who is using large EC2 instances in
>>> production? Would you be kind enough to share your JVM arguments and any
>>> other tips?
>>>
>>> Thanks for any help,
>>> Curt
>>> --
>>> Curt, ZipZapPlay Inc., www.PlayCrafter.com,
>>> http://apps.facebook.com/happyhabitat
>>>
>>
>>
>

Re: Problems running Cassandra 0.6.1 on large EC2 instances.

Reply via email to