We can get Cassandra to run great for a few hours now. Writing to and reading from cassandra work well and the read/write times are good etc. We also changed our config to enable row caching (we're hoping to ditch our memcache server layer entirely).
Unfortunately, running on an EC2 High Memory extra large instance with batch mode led to huge iowait on the cpu with only 20% of our traffic. We don't have the commit log on a different disk yet, but it still seemed much higher than it should have been. On Jonathan's recommendation we changed to periodic mode in storage-conf.xml. This fixed the io wait problem, but the machines went down hard after a few million writes. Unfortunately I don't have any jmx or jvm level debugging (other than command line stuff) so I don't have a ton of insight yet as to why it choked. The main symptoms are memory dropping to zero and the cpu shooting up to 100% very suddenly. Typically CPU shot up to 100% at roughly the same time for all machines. We have two hypotheses: - our php client is connection leaking somehow - the GC kicks in and has so much memory to clean up ( the heap is at 12 Gigs) that it takes forever and while the GC is running and eating cpu something else goes wrong. I'm hooking up jcollectd to cassandra to see if we can find out more. If anyone has any other suggestions please let me know. C -- Curt, ZipZapPlay Inc., www.PlayCrafter.com, http://apps.facebook.com/bakinglife http://apps.facebook.com/happyhabitat On Fri, May 21, 2010 at 12:53 PM, S Ahmed <sahmed1...@gmail.com> wrote: > curious how did things turn out? > > > On Tue, May 18, 2010 at 1:38 PM, Curt Bererton <c...@zipzapplay.com>wrote: > >> We only have a few CFs (6 or 7). I've increased the MemtableThroughputInMB >> and MemtableOperationsInMillions as per your suggestions. Do we really >> need a swap file though? I suppose it can't hurt, but with my problem in >> particular we weren't maxing out main memory. >> >> We'll be running another test today and see if the settings changes >> proposed so far fix our problem ( I hope so ). >> >> Best, >> Curt >> >> >> On Tue, May 18, 2010 at 5:59 AM, Lee Parker <l...@socialagency.com> wrote: >> >>> How many different CFs do you have? If you only have a few, I would >>> highly recommend increasing the MemtableThroughputInMB and >>> MemtableOperationsInMillions. >>> We only have to CFs and I have it set at 256MB and 2.5m. Since most of our >>> columns are relatively small, these values are practically equivalent to >>> each other. I would also recommend dropping your heap space to 6G and >>> adding a swap file. In our case, the large EC2 instances didn't have any >>> swap setup by default. >>> >>> Lee Parker >>> >>> >>> >> >