Re: OOM at Bootstrap Time

DuyHai Doan Sat, 25 Oct 2014 23:28:53 -0700

Hello Maxime

 Can you put the complete logs and config somewhere ? It would be
interesting to know what is the cause of the OOM.


On Sun, Oct 26, 2014 at 3:15 AM, Maxime <maxim...@gmail.com> wrote:

> Thanks a lot that is comforting. We are also small at the moment so I
> definitely can relate with the idea of keeping small and simple at a level
> where it just works.
>
> I see the new Apache version has a lot of fixes so I will try to upgrade
> before I look into downgrading.
>
>
> On Saturday, October 25, 2014, Laing, Michael <michael.la...@nytimes.com>
> wrote:
>
>> Since no one else has stepped in...
>>
>> We have run clusters with ridiculously small nodes - I have a production
>> cluster in AWS with 4GB nodes each with 1 CPU and disk-based instance
>> storage. It works fine but you can see those little puppies struggle...
>>
>> And I ran into problems such as you observe...
>>
>> Upgrading Java to the latest 1.7 and - most importantly - *reverting to
>> the default configuration, esp. for heap*, seemed to settle things down
>> completely. Also make sure that you are using the 'recommended production
>> settings' from the docs on your boxen.
>>
>> However we are running 2.0.x not 2.1.0 so YMMV.
>>
>> And we are switching to 15GB nodes w 2 heftier CPUs each and SSD storage
>> - still a 'small' machine, but much more reasonable for C*.
>>
>> However I can't say I am an expert, since I deliberately keep things so
>> simple that we do not encounter problems - it just works so I dig into
>> other stuff.
>>
>> ml
>>
>>
>> On Sat, Oct 25, 2014 at 5:22 PM, Maxime <maxim...@gmail.com> wrote:
>>
>>> Hello, I've been trying to add a new node to my cluster ( 4 nodes ) for
>>> a few days now.
>>>
>>> I started by adding a node similar to my current configuration, 4 GB or
>>> RAM + 2 Cores on DigitalOcean. However every time, I would end up getting
>>> OOM errors after many log entries of the type:
>>>
>>> INFO  [SlabPoolCleaner] 2014-10-25 13:44:57,240
>>> ColumnFamilyStore.java:856 - Enqueuing flush of mycf: 5383 (0%) on-heap, 0
>>> (0%) off-heap
>>>
>>> leading to:
>>>
>>> ka-120-Data.db (39291 bytes) for commitlog position
>>> ReplayPosition(segmentId=1414243978538, position=23699418)
>>> WARN  [SharedPool-Worker-13] 2014-10-25 13:48:18,032
>>> AbstractTracingAwareExecutorService.java:167 - Uncaught exception on thread
>>> Thread[SharedPool-Worker-13,5,main]: {}
>>> java.lang.OutOfMemoryError: Java heap space
>>>
>>> Thinking it had to do with either compaction somehow or streaming, 2
>>> activities I've had tremendous issues with in the past; I tried to slow
>>> down the setstreamthroughput to extremely low values all the way to 5. I
>>> also tried setting setcompactionthoughput to 0, and then reading that in
>>> some cases it might be too fast, down to 8. Nothing worked, it merely
>>> vaguely changed the mean time to OOM but not in a way indicating either was
>>> anywhere a solution.
>>>
>>> The nodes were configured with 2 GB of Heap initially, I tried to crank
>>> it up to 3 GB, stressing the host memory to its limit.
>>>
>>> After doing some exploration (I am considering writing a Cassandra Ops
>>> documentation with lessons learned since there seems to be little of it in
>>> organized fashions), I read that some people had strange issues on
>>> lower-end boxes like that, so I bit the bullet and upgraded my new node to
>>> a 8GB + 4 Core instance, which was anecdotally better.
>>>
>>> To my complete shock, exact same issues are present, even raising the
>>> Heap memory to 6 GB. I figure it can't be a "normal" situation anymore, but
>>> must be a bug somehow.
>>>
>>> My cluster is 4 nodes, RF of 2, about 160 GB of data across all nodes.
>>> About 10 CF of varying sizes. Runtime writes are between 300 to 900 /
>>> second. Cassandra 2.1.0, nothing too wild.
>>>
>>> Has anyone encountered these kinds of issues before? I would really
>>> enjoy hearing about the experiences of people trying to run small-sized
>>> clusters like mine. From everything I read, Cassandra operations go very
>>> well on large (16 GB + 8 Cores) machines, but I'm sad to report I've had
>>> nothing but trouble trying to run on smaller machines, perhaps I can learn
>>> from other's experience?
>>>
>>> Full logs can be provided to anyone interested.
>>>
>>> Cheers
>>>
>>
>>

Re: OOM at Bootstrap Time

Reply via email to