Re: Cassandra cluster runs into OOM when bulk loading data

Roland Hänel Wed, 28 Apr 2010 00:46:23 -0700

There are other threads linked to this issue. Most notable, I think we're
hitting


https://issues.apache.org/jira/browse/CASSANDRA-1014

here.


2010/4/27 Schubert Zhang <zson...@gmail.com>

> Seems:
>
> ROW-MUTATION-STAGE   32      3349       63897493
> is the clue, too many mutation requests are pending.
>
>
> Yes, I also think cassandra should add a mechanism to avoid too many
> requests pending (in queue).
> When the queue is full, just reject the request from client.
>
> Seems https://issues.apache.org/jira/browse/CASSANDRA-685 is what we want.
>
>
>
> On Tue, Apr 27, 2010 at 8:16 PM, Eric Yu <suc...@gmail.com> wrote:
>
>> I wrote a script to record the tpstats output every 5 seconds.
>> Here is the output just before the jvm OOM:
>>
>>
>> Pool Name                    Active   Pending      Completed
>> FILEUTILS-DELETE-POOL             0         0            280
>>
>> STREAM-STAGE                      0         0              0
>> RESPONSE-STAGE                    0         0         245573
>>
>> ROW-READ-STAGE                    0         0              0
>> LB-OPERATIONS                     0         0              0
>>  MESSAGE-DESERIALIZER-POOL         1  14290091       65943291
>> GMFD                              0         0          26670
>>
>> LB-TARGET                         0         0              0
>> CONSISTENCY-MANAGER               0         0              0
>>  ROW-MUTATION-STAGE               32      3349       63897493
>>
>> MESSAGE-STREAMING-POOL            0         0              3
>> LOAD-BALANCER-STAGE               0         0              0
>> FLUSH-SORTER-POOL                 0         0              0
>>  MEMTABLE-POST-FLUSHER             0         0            420
>> FLUSH-WRITER-POOL                 0         0            420
>>
>> AE-SERVICE-STAGE                  1         1              4
>> HINTED-HANDOFF-POOL               0         0             52
>>
>>
>> On Tue, Apr 27, 2010 at 10:53 AM, Chris Goffinet <goffi...@digg.com>wrote:
>>
>>> I'll work on doing more tests around this. In 0.5 we used a different
>>> data structure that required polling. But this does seem problematic.
>>>
>>>  -Chris
>>>
>>> On Apr 26, 2010, at 7:04 PM, Eric Yu wrote:
>>>
>>> I have the same problem here, and I analysised the hprof file with mat,
>>> as you said, LinkedBlockQueue used 2.6GB.
>>> I think the ThreadPool of cassandra should limit the queue size.
>>>
>>> cassandra 0.6.1
>>>
>>> java version
>>> $ java -version
>>> java version "1.6.0_20"
>>> Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
>>> Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed mode)
>>>
>>> iostat
>>> $ iostat -x -l 1
>>> Device:         rrqm/s   wrqm/s   r/s   w/s    rkB/s    wkB/s avgrq-sz
>>> avgqu-sz   await  svctm  %util
>>> sda              81.00  8175.00 224.00 17.00 23984.00  2728.00
>>> 221.68     1.01    1.86   0.76  18.20
>>>
>>> tpstats, of coz, this node is still alive
>>> $ ./nodetool -host localhost tpstats
>>> Pool Name                    Active   Pending      Completed
>>> FILEUTILS-DELETE-POOL             0         0           1281
>>> STREAM-STAGE                      0         0              0
>>> RESPONSE-STAGE                    0         0      473617241
>>> ROW-READ-STAGE                    0         0              0
>>> LB-OPERATIONS                     0         0              0
>>> MESSAGE-DESERIALIZER-POOL         0         0      718355184
>>> GMFD                              0         0         132509
>>> LB-TARGET                         0         0              0
>>> CONSISTENCY-MANAGER               0         0              0
>>> ROW-MUTATION-STAGE                0         0      293735704
>>> MESSAGE-STREAMING-POOL            0         0              6
>>> LOAD-BALANCER-STAGE               0         0              0
>>> FLUSH-SORTER-POOL                 0         0              0
>>> MEMTABLE-POST-FLUSHER             0         0           1870
>>> FLUSH-WRITER-POOL                 0         0           1870
>>> AE-SERVICE-STAGE                  0         0              5
>>> HINTED-HANDOFF-POOL               0         0             21
>>>
>>>
>>> On Tue, Apr 27, 2010 at 3:32 AM, Chris Goffinet <goffi...@digg.com>wrote:
>>>
>>>> Upgrade to b20 of Sun's version of JVM. This OOM might be related to
>>>> LinkedBlockQueue issues that were fixed.
>>>>
>>>> -Chris
>>>>
>>>>
>>>> 2010/4/26 Roland Hänel <rol...@haenel.me>
>>>>
>>>>> Cassandra Version 0.6.1
>>>>> OpenJDK Server VM (build 14.0-b16, mixed mode)
>>>>> Import speed is about 10MB/s for the full cluster; if a compaction is
>>>>> going on the individual node is I/O limited
>>>>> tpstats: caught me, didn't know this. I will set up a test and try to
>>>>> catch a node during the critical time.
>>>>>
>>>>> Thanks,
>>>>> Roland
>>>>>
>>>>>
>>>>> 2010/4/26 Chris Goffinet <goffi...@digg.com>
>>>>>
>>>>>  Which version of Cassandra?
>>>>>> Which version of Java JVM are you using?
>>>>>> What do your I/O stats look like when bulk importing?
>>>>>> When you run `nodeprobe -host XXXX tpstats` is any thread pool backing
>>>>>> up during the import?
>>>>>>
>>>>>> -Chris
>>>>>>
>>>>>>
>>>>>> 2010/4/26 Roland Hänel <rol...@haenel.me>
>>>>>>
>>>>>> I have a cluster of 5 machines building a Cassandra datastore, and I
>>>>>>> load bulk data into this using the Java Thrift API. The first ~250GB 
>>>>>>> runs
>>>>>>> fine, then, one of the nodes starts to throw OutOfMemory exceptions. 
>>>>>>> I'm not
>>>>>>> using and row or index caches, and since I only have 5 CF's and some 
>>>>>>> 2,5 GB
>>>>>>> of RAM allocated to the JVM (-Xmx2500M), in theory, that should happen. 
>>>>>>> All
>>>>>>> inserts are done with consistency level ALL.
>>>>>>>
>>>>>>> I hope with this I have avoided all the 'usual dummy errors' that
>>>>>>> lead to OOM's. I have begun to troubleshoot the issue with JMX, however,
>>>>>>> it's difficult to catch the JVM in the right moment because it runs 
>>>>>>> well for
>>>>>>> several hours before this thing happens.
>>>>>>>
>>>>>>> One thing gets to my mind, maybe one of the experts could confirm or
>>>>>>> reject this idea for me: is it possible that when one machine slows 
>>>>>>> down a
>>>>>>> little bit (for example because a big compaction is going on), the 
>>>>>>> memtables
>>>>>>> don't get flushed to disk as fast as they are building up under the
>>>>>>> continuing bulk import? That would result in a downward spiral, the 
>>>>>>> system
>>>>>>> gets slower and slower on disk I/O, but since more and more data arrives
>>>>>>> over Thrift, finally OOM.
>>>>>>>
>>>>>>> I'm using the "periodic" commit log sync, maybe also this could
>>>>>>> create a situation where the commit log writer is too slow to catch up 
>>>>>>> with
>>>>>>> the data intake, resulting in ever growing memory usage?
>>>>>>>
>>>>>>> Maybe these thoughts are just bullshit. Let me now if so... ;-)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>>
>>
>

Re: Cassandra cluster runs into OOM when bulk loading data

Reply via email to