I am not sure if you had got a chance to take a look at this http://www.datastax.com/docs/1.1/troubleshooting/index#oom http://www.datastax.com/docs/1.1/install/recommended_settings
Can you attach the cassandra logs and the cassandra.yaml, it should be able to give us more details about the issue? Thanks, -VK On Fri, Dec 6, 2013 at 3:31 PM, Klaus Brunner <klaus.brun...@gmail.com>wrote: > We're getting fairly reproducible OOMs on a 2-node cluster using > Cassandra 1.2.11, typically in situations with a heavy read load. A > sample of some stack traces is at > https://gist.github.com/KlausBrunner/7820902 - they're all failing > somewhere down from table.getRow(), though I don't know if that's part > of query processing, compaction, or something else. > > - The main CFs contain some 100k rows, none of them particularly wide. > - Heap dumps invariably show a single huge byte array (~1.6 GiB > associated with the OOM'ing thread) hogging > 80% of the Java heap. > The array seems to contain all/many rows of one CF. > - We're moderately certain there's no "killer query" with a huge > result set involved here, but we can't see exactly what triggers this. > - We've tried to switch to LeveledCompaction, to no avail. > - Xms/x is set to some 4 GB. > - The logs show the usual signs of panic ("flushing memtables") before > actually OOMing. It seems that this scenario is often or even always > after a compaction, but it's not quite conclusive. > > I'm somewhat worried that Cassandra would read so much data into a > single contiguous byte[] at any point. Could this be related to > compaction? Any ideas what we could do about this? > > Thanks > > Klaus >