How many different CFs do you have? If you only have a few, I would highly recommend increasing the MemtableThroughputInMB and MemtableOperationsInMillions. We only have to CFs and I have it set at 256MB and 2.5m. Since most of our columns are relatively small, these values are practically equivalent to each other. I would also recommend dropping your heap space to 6G and adding a swap file. In our case, the large EC2 instances didn't have any swap setup by default.
Lee Parker On Mon, May 17, 2010 at 7:31 PM, Curt Bererton <c...@zipzapplay.com> wrote: > Agreed, and I just saw that in storage conf that a higher value for the > MemtableFlushAfterMinutes is suggested otherwise you might get a "flush > storm: of all your memtables flushing at once". I've changed that as well. > > > -- > Curt, ZipZapPlay Inc., www.PlayCrafter.com, > http://apps.facebook.com/happyhabitat > > > On Mon, May 17, 2010 at 5:27 PM, Mark Greene <green...@gmail.com> wrote: > >> Since you only have 7.5GB of memory, it's a really bad idea to set your >> heap space to a max of 7GB. Remember, the java process heap will be larger >> than what Xmx is allowed to grow to. If you reach this level, you can >> start swapping which is very very bad. As Brandon pointed out, you haven't >> exhausted your physically memory yet but you still want to lower Xmx to >> something like 5 maybe 6 GB. >> >> >> On Mon, May 17, 2010 at 7:02 PM, Curt Bererton <c...@zipzapplay.com>wrote: >> >>> Here are the current jvm args and java version: >>> >>> # Arguments to pass to the JVM >>> JVM_OPTS=" \ >>> -ea \ >>> -Xms128M \ >>> -Xmx7G \ >>> -XX:TargetSurvivorRatio=90 \ >>> -XX:+AggressiveOpts \ >>> -XX:+UseParNewGC \ >>> -XX:+UseConcMarkSweepGC \ >>> -XX:+CMSParallelRemarkEnabled \ >>> -XX:+HeapDumpOnOutOfMemoryError \ >>> -XX:SurvivorRatio=128 \ >>> -XX:MaxTenuringThreshold=0 \ >>> -Dcom.sun.management.jmxremote.port=8080 \ >>> -Dcom.sun.management.jmxremote.ssl=false \ >>> -Dcom.sun.management.jmxremote.authenticate=false" >>> >>> java -version outputs: >>> java version "1.6.0_20" >>> Java(TM) SE Runtime Environment (build 1.6.0_20-b02) >>> Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed mode) >>> >>> So pretty much the defaults aside from the 7Gig max heap. CPU is totally >>> hammered right now, and it is receiving 0 ops/sec from me since I >>> disconnected it from our application right now until I can figure out what's >>> going on. >>> >>> running top on the machine I get: >>> top - 18:56:32 up 2 days, 20:57, 2 users, load average: 14.97, 15.24, >>> 15.13 >>> Tasks: 87 total, 5 running, 82 sleeping, 0 stopped, 0 zombie >>> Cpu(s): 40.1%us, 33.9%sy, 0.0%ni, 0.1%id, 0.0%wa, 0.0%hi, 1.3%si, >>> 24.6%st >>> Mem: 7872040k total, 3618764k used, 4253276k free, 387536k buffers >>> Swap: 0k total, 0k used, 0k free, 1655556k cached >>> >>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ >>> COMMAND >>> 2566 cassandr 25 0 7906m 639m 10m S 150 8.3 5846:35 java >>> >>> >>> I have jconsole up and running, and jconsole vm Summary tab says: >>> - total physical memory: 7,872,040 K >>> - Free physical memory: 4,253,036 K >>> - Total swap space: 0K >>> - Free swap space: 0K >>> - Committed virtual memory: 8,096648K >>> >>> Is there a specific thread I can look at in jconsole that might give me a >>> clue? It's weird that it's still at 100% cpu even though it's getting no >>> traffic from outside right now. I suppose it might still be talking across >>> the machines though. >>> >>> Also, stopping cassandra and starting cassandra on one of the 4 machines >>> caused the CPU to go back down to almost normal levels. >>> >>> Here's the ring; >>> >>> Address Status Load >>> Range Ring >>> >>> 170141183460469231731687303715884105728 >>> 10.251.XX.XX Up 2.15 MB >>> 42535295865117307932921825928971026432 |<--| >>> 10.250.XX.XX Up 2.42 MB >>> 85070591730234615865843651857942052864 | | >>> 10.250.XX.XX Up 2.47 MB >>> 127605887595351923798765477786913079296 | | >>> 10.250.XX.XX Up 2.46 MB >>> 170141183460469231731687303715884105728 |-->| >>> >>> Any thoughts? >>> >>> Best, >>> >>> Curt >>> -- >>> Curt, ZipZapPlay Inc., www.PlayCrafter.com, >>> http://apps.facebook.com/happyhabitat >>> >>> >>> On Mon, May 17, 2010 at 3:51 PM, Mark Greene <green...@gmail.com> wrote: >>> >>>> Can you provide us with the current JVM args? Also, what type of work >>>> load you are giving the ring (op/s)? >>>> >>>> >>>> On Mon, May 17, 2010 at 6:39 PM, Curt Bererton <c...@zipzapplay.com>wrote: >>>> >>>>> Hello Cassandra users+experts, >>>>> >>>>> Hopefully someone will be able to point me in the correct direction. We >>>>> have cassandra 0.6.1 working on our test servers and we *thought* >>>>> everything >>>>> was great and ready to move to production. We are currently running a ring >>>>> of 4 large instance EC2 (http://aws.amazon.com/ec2/instance-types/) >>>>> servers on production with a replication factor of 3 and a QUORUM >>>>> consistency level. We ran a test on 1% of our users, and everything was >>>>> writing to and reading from cassandra great for the first 3 hours. After >>>>> that point CPU usage spiked to 100% and stayed there, basically on all 4 >>>>> machines at once. This smells to me like a GC issue, and I'm looking into >>>>> it >>>>> with jconsole right now. If anyone can help me debug this and get >>>>> cassandra >>>>> all the way up and running without CPU spiking I would be forever in their >>>>> debt. >>>>> >>>>> I suspect that anyone else running cassandra on large EC2 instances >>>>> might just be able to tell me what JVM args they are successfully using >>>>> in a >>>>> production environment and if they upgraded to Cassandra 0.6.2 from 0.6.1, >>>>> and did they go to batched writes due to bug 1014? ( >>>>> https://issues.apache.org/jira/browse/CASSANDRA-1014) That might >>>>> answer all my questions. >>>>> >>>>> Is there anyone on the list who is using large EC2 instances in >>>>> production? Would you be kind enough to share your JVM arguments and any >>>>> other tips? >>>>> >>>>> Thanks for any help, >>>>> Curt >>>>> -- >>>>> Curt, ZipZapPlay Inc., www.PlayCrafter.com, >>>>> http://apps.facebook.com/happyhabitat >>>>> >>>> >>>> >>> >> >