> If the cache is stored in the heap, how big can the heap be made > realistically on a 24gb ram machine? I am a java newbie but I have read > concerns with going over 8gb for the heap as the GC can be too painful/take > too long. I already have seen timeout issues (node is dead errors) under > load during GC or compaction. Can/should the heap be set to 16gb with 24gb > ram?
I have never run Cassandra in production with such a large heap, so I'll let others comment on practical experience with that. In general however, with the JVM and the CMS garbage collector (which is enabled by default with Cassandra), having a large heap is not necessarily a problem depending on the application's workload. In terms of GC:s taking too long - with the default throughput collector used by the JVM you will tend to see the longest pause times scale roughly linearly with heap size. Most pauses would still be short (these are what is known as young generation collections), but periodically a so-called full collection is done. WIth the throughput collector, this implies stopping all Java threads while the *entire* Java heap is garbage collected. WIth the CMS (Concurrent Mark/Sweep) collector the intent is that the periodic scans of the entire Java heap are done concurrently with the application without pausing it. Fallback to full stop-the-world garbage collections can still happen if CMS fails to complete such work fast enough, in which case tweaking of garbage collection settings may be required. One thing to consider in any case is how much memory you actually need; the more you give to the JVM, the less there is left for the OS to cache file contents. If for example your true working set in cassandra is, to grab a random number, 3 GB and you set the heap sizeto 15 GB - now you're wasting a lot of memory by allowing the JVM to postpone GC until it starts approaching the 15 GB mark. This is actually good (normally) for overall GC throughput, but not necessarily good overall for something like cassandra where there is a direct trade-off with cache eviction in the operating system possibly causing additional I/O. Personally I'd be very interested in hearing any stories about running cassandra nodes with 10+ gig heap sizes, and how well it has worked. My gut feeling is that it should work reasonable well, but I have no evidence of that and I may very well be wrong. Anyone? (On a related noted, my limited testing with the G1 collector with Cassandra has indicated it works pretty well. Though I'm concerned with the weak ref finalization based cleanup of compacted sstables since the G1 collector will be much less deterministic in when a particular object may be collected. Has anyone deployed Cassandra with G1 on very large heaps under real load?) -- / Peter Schuller