Seems: ROW-MUTATION-STAGE 32 3349 63897493 is the clue, too many mutation requests are pending.
Yes, I also think cassandra should add a mechanism to avoid too many requests pending (in queue). When the queue is full, just reject the request from client. Seems https://issues.apache.org/jira/browse/CASSANDRA-685 is what we want. On Tue, Apr 27, 2010 at 8:16 PM, Eric Yu <[email protected]> wrote: > I wrote a script to record the tpstats output every 5 seconds. > Here is the output just before the jvm OOM: > > > Pool Name Active Pending Completed > FILEUTILS-DELETE-POOL 0 0 280 > > STREAM-STAGE 0 0 0 > RESPONSE-STAGE 0 0 245573 > > ROW-READ-STAGE 0 0 0 > LB-OPERATIONS 0 0 0 > MESSAGE-DESERIALIZER-POOL 1 14290091 65943291 > GMFD 0 0 26670 > > LB-TARGET 0 0 0 > CONSISTENCY-MANAGER 0 0 0 > ROW-MUTATION-STAGE 32 3349 63897493 > > MESSAGE-STREAMING-POOL 0 0 3 > LOAD-BALANCER-STAGE 0 0 0 > FLUSH-SORTER-POOL 0 0 0 > MEMTABLE-POST-FLUSHER 0 0 420 > FLUSH-WRITER-POOL 0 0 420 > > AE-SERVICE-STAGE 1 1 4 > HINTED-HANDOFF-POOL 0 0 52 > > > On Tue, Apr 27, 2010 at 10:53 AM, Chris Goffinet <[email protected]>wrote: > >> I'll work on doing more tests around this. In 0.5 we used a different data >> structure that required polling. But this does seem problematic. >> >> -Chris >> >> On Apr 26, 2010, at 7:04 PM, Eric Yu wrote: >> >> I have the same problem here, and I analysised the hprof file with mat, as >> you said, LinkedBlockQueue used 2.6GB. >> I think the ThreadPool of cassandra should limit the queue size. >> >> cassandra 0.6.1 >> >> java version >> $ java -version >> java version "1.6.0_20" >> Java(TM) SE Runtime Environment (build 1.6.0_20-b02) >> Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed mode) >> >> iostat >> $ iostat -x -l 1 >> Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz >> avgqu-sz await svctm %util >> sda 81.00 8175.00 224.00 17.00 23984.00 2728.00 >> 221.68 1.01 1.86 0.76 18.20 >> >> tpstats, of coz, this node is still alive >> $ ./nodetool -host localhost tpstats >> Pool Name Active Pending Completed >> FILEUTILS-DELETE-POOL 0 0 1281 >> STREAM-STAGE 0 0 0 >> RESPONSE-STAGE 0 0 473617241 >> ROW-READ-STAGE 0 0 0 >> LB-OPERATIONS 0 0 0 >> MESSAGE-DESERIALIZER-POOL 0 0 718355184 >> GMFD 0 0 132509 >> LB-TARGET 0 0 0 >> CONSISTENCY-MANAGER 0 0 0 >> ROW-MUTATION-STAGE 0 0 293735704 >> MESSAGE-STREAMING-POOL 0 0 6 >> LOAD-BALANCER-STAGE 0 0 0 >> FLUSH-SORTER-POOL 0 0 0 >> MEMTABLE-POST-FLUSHER 0 0 1870 >> FLUSH-WRITER-POOL 0 0 1870 >> AE-SERVICE-STAGE 0 0 5 >> HINTED-HANDOFF-POOL 0 0 21 >> >> >> On Tue, Apr 27, 2010 at 3:32 AM, Chris Goffinet <[email protected]>wrote: >> >>> Upgrade to b20 of Sun's version of JVM. This OOM might be related to >>> LinkedBlockQueue issues that were fixed. >>> >>> -Chris >>> >>> >>> 2010/4/26 Roland Hänel <[email protected]> >>> >>>> Cassandra Version 0.6.1 >>>> OpenJDK Server VM (build 14.0-b16, mixed mode) >>>> Import speed is about 10MB/s for the full cluster; if a compaction is >>>> going on the individual node is I/O limited >>>> tpstats: caught me, didn't know this. I will set up a test and try to >>>> catch a node during the critical time. >>>> >>>> Thanks, >>>> Roland >>>> >>>> >>>> 2010/4/26 Chris Goffinet <[email protected]> >>>> >>>> Which version of Cassandra? >>>>> Which version of Java JVM are you using? >>>>> What do your I/O stats look like when bulk importing? >>>>> When you run `nodeprobe -host XXXX tpstats` is any thread pool backing >>>>> up during the import? >>>>> >>>>> -Chris >>>>> >>>>> >>>>> 2010/4/26 Roland Hänel <[email protected]> >>>>> >>>>> I have a cluster of 5 machines building a Cassandra datastore, and I >>>>>> load bulk data into this using the Java Thrift API. The first ~250GB runs >>>>>> fine, then, one of the nodes starts to throw OutOfMemory exceptions. I'm >>>>>> not >>>>>> using and row or index caches, and since I only have 5 CF's and some 2,5 >>>>>> GB >>>>>> of RAM allocated to the JVM (-Xmx2500M), in theory, that should happen. >>>>>> All >>>>>> inserts are done with consistency level ALL. >>>>>> >>>>>> I hope with this I have avoided all the 'usual dummy errors' that lead >>>>>> to OOM's. I have begun to troubleshoot the issue with JMX, however, it's >>>>>> difficult to catch the JVM in the right moment because it runs well for >>>>>> several hours before this thing happens. >>>>>> >>>>>> One thing gets to my mind, maybe one of the experts could confirm or >>>>>> reject this idea for me: is it possible that when one machine slows down >>>>>> a >>>>>> little bit (for example because a big compaction is going on), the >>>>>> memtables >>>>>> don't get flushed to disk as fast as they are building up under the >>>>>> continuing bulk import? That would result in a downward spiral, the >>>>>> system >>>>>> gets slower and slower on disk I/O, but since more and more data arrives >>>>>> over Thrift, finally OOM. >>>>>> >>>>>> I'm using the "periodic" commit log sync, maybe also this could create >>>>>> a situation where the commit log writer is too slow to catch up with the >>>>>> data intake, resulting in ever growing memory usage? >>>>>> >>>>>> Maybe these thoughts are just bullshit. Let me now if so... ;-) >>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >> >> >
