Re: Cassandra OOM on joining existing ring

Kunal Gangakhedkar Fri, 10 Jul 2015 10:42:41 -0700

Thanks, Sebastian.

Couple of questions (I'm really new to cassandra):
1. How do I interpret the output of 'nodetool cfstats' to figure out the
issues? Any documentation pointer on that would be helpful.


2. I'm primarily a python/c developer - so, totally clueless about JVM
environment. So, please bare with me as I would need a lot of hand-holding.
Should I just copy+paste the settings you gave and try to restart the
failing cassandra server?

Thanks,
Kunal

On 10 July 2015 at 22:35, Sebastian Estevez <sebastian.este...@datastax.com>
wrote:

> #1 You need more information.
>
> a) Take a look at your .hprof file (memory heap from the OOM) with an
> introspection tool like jhat or visualvm or java flight recorder and see
> what is using up your RAM.
>
> b) How big are your large rows (use nodetool cfstats on each node). If
> your data model is bad, you are going to have to re-design it no matter
> what.
>
> #2 As a possible workaround try using the G1GC allocator with the settings
> from c* 3.0 instead of CMS. I've seen lots of success with it lately (tl;dr
> G1GC is much simpler than CMS and almost as good as a finely tuned CMS).
> *Note:* Use it with the latest Java 8 from Oracle. Do *not* set the
> newgen size for G1 sets it dynamically:
>
> # min and max heap sizes should be set to the same value to avoid
>> # stop-the-world GC pauses during resize, and so that we can lock the
>> # heap in memory on startup to prevent any of it from being swapped
>> # out.
>> JVM_OPTS="$JVM_OPTS -Xms${MAX_HEAP_SIZE}"
>> JVM_OPTS="$JVM_OPTS -Xmx${MAX_HEAP_SIZE}"
>>
>> # Per-thread stack size.
>> JVM_OPTS="$JVM_OPTS -Xss256k"
>>
>> # Use the Hotspot garbage-first collector.
>> JVM_OPTS="$JVM_OPTS -XX:+UseG1GC"
>>
>> # Have the JVM do less remembered set work during STW, instead
>> # preferring concurrent GC. Reduces p99.9 latency.
>> JVM_OPTS="$JVM_OPTS -XX:G1RSetUpdatingPauseTimePercent=5"
>>
>> # The JVM maximum is 8 PGC threads and 1/4 of that for ConcGC.
>> # Machines with > 10 cores may need additional threads.
>> # Increase to <= full cores (do not count HT cores).
>> #JVM_OPTS="$JVM_OPTS -XX:ParallelGCThreads=16"
>> #JVM_OPTS="$JVM_OPTS -XX:ConcGCThreads=16"
>>
>> # Main G1GC tunable: lowering the pause target will lower throughput and
>> vise versa.
>> # 200ms is the JVM default and lowest viable setting
>> # 1000ms increases throughput. Keep it smaller than the timeouts in
>> cassandra.yaml.
>> JVM_OPTS="$JVM_OPTS -XX:MaxGCPauseMillis=500"
>> # Do reference processing in parallel GC.
>> JVM_OPTS="$JVM_OPTS -XX:+ParallelRefProcEnabled"
>>
>> # This may help eliminate STW.
>> # The default in Hotspot 8u40 is 40%.
>> #JVM_OPTS="$JVM_OPTS -XX:InitiatingHeapOccupancyPercent=25"
>>
>> # For workloads that do large allocations, increasing the region
>> # size may make things more efficient. Otherwise, let the JVM
>> # set this automatically.
>> #JVM_OPTS="$JVM_OPTS -XX:G1HeapRegionSize=32m"
>>
>> # Make sure all memory is faulted and zeroed on startup.
>> # This helps prevent soft faults in containers and makes
>> # transparent hugepage allocation more effective.
>> JVM_OPTS="$JVM_OPTS -XX:+AlwaysPreTouch"
>>
>> # Biased locking does not benefit Cassandra.
>> JVM_OPTS="$JVM_OPTS -XX:-UseBiasedLocking"
>>
>> # Larger interned string table, for gossip's benefit (CASSANDRA-6410)
>> JVM_OPTS="$JVM_OPTS -XX:StringTableSize=1000003"
>>
>> # Enable thread-local allocation blocks and allow the JVM to automatically
>> # resize them at runtime.
>> JVM_OPTS="$JVM_OPTS -XX:+UseTLAB -XX:+ResizeTLAB"
>>
>> # http://www.evanjones.ca/jvm-mmap-pause.html
>> JVM_OPTS="$JVM_OPTS -XX:+PerfDisableSharedMem"
>
>
> All the best,
>
>
> [image: datastax_logo.png] <http://www.datastax.com/>
>
> Sebastián Estévez
>
> Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com
>
> [image: linkedin.png] <https://www.linkedin.com/company/datastax> [image:
> facebook.png] <https://www.facebook.com/datastax> [image: twitter.png]
> <https://twitter.com/datastax> [image: g+.png]
> <https://plus.google.com/+Datastax/about>
> <http://feeds.feedburner.com/datastax>
>
> <http://cassandrasummit-datastax.com/>
>
> DataStax is the fastest, most scalable distributed database technology,
> delivering Apache Cassandra to the world’s most innovative enterprises.
> Datastax is built to be agile, always-on, and predictably scalable to any
> size. With more than 500 customers in 45 countries, DataStax is the
> database technology and transactional backbone of choice for the worlds
> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>
> On Fri, Jul 10, 2015 at 12:55 PM, Kunal Gangakhedkar <
> kgangakhed...@gmail.com> wrote:
>
>> I upgraded my instance from 8GB to a 14GB one.
>> Allocated 8GB to jvm heap in cassandra-env.sh.
>>
>> And now, it crashes even faster with an OOM..
>>
>> Earlier, with 4GB heap, I could go upto ~90% replication completion (as
>> reported by nodetool netstats); now, with 8GB heap, I cannot even get
>> there. I've already restarted cassandra service 4 times with 8GB heap.
>>
>> No clue what's going on.. :(
>>
>> Kunal
>>
>> On 10 July 2015 at 17:45, Jack Krupansky <jack.krupan...@gmail.com>
>> wrote:
>>
>>> You, and only you, are responsible for knowing your data and data model.
>>>
>>> If columns per row or rows per partition can be large, then an 8GB
>>> system is probably too small. But the real issue is that you need to keep
>>> your partition size from getting too large.
>>>
>>> Generally, an 8GB system is okay, but only for reasonably-sized
>>> partitions, like under 10MB.
>>>
>>>
>>> -- Jack Krupansky
>>>
>>> On Fri, Jul 10, 2015 at 8:05 AM, Kunal Gangakhedkar <
>>> kgangakhed...@gmail.com> wrote:
>>>
>>>> I'm new to cassandra
>>>> How do I find those out? - mainly, the partition params that you asked
>>>> for. Others, I think I can figure out.
>>>>
>>>> We don't have any large objects/blobs in the column values - it's all
>>>> textual, date-time, numeric and uuid data.
>>>>
>>>> We use cassandra to primarily store segmentation data - with segment
>>>> type as partition key. That is again divided into two separate column
>>>> families; but they have similar structure.
>>>>
>>>> Columns per row can be fairly large - each segment type as the row key
>>>> and associated user ids and timestamp as column value.
>>>>
>>>> Thanks,
>>>> Kunal
>>>>
>>>> On 10 July 2015 at 16:36, Jack Krupansky <jack.krupan...@gmail.com>
>>>> wrote:
>>>>
>>>>> What does your data and data model look like - partition size, rows
>>>>> per partition, number of columns per row, any large values/blobs in column
>>>>> values?
>>>>>
>>>>> You could run fine on an 8GB system, but only if your rows and
>>>>> partitions are reasonably small. Any large partitions could blow you away.
>>>>>
>>>>> -- Jack Krupansky
>>>>>
>>>>> On Fri, Jul 10, 2015 at 4:22 AM, Kunal Gangakhedkar <
>>>>> kgangakhed...@gmail.com> wrote:
>>>>>
>>>>>> Attaching the stack dump captured from the last OOM.
>>>>>>
>>>>>> Kunal
>>>>>>
>>>>>> On 10 July 2015 at 13:32, Kunal Gangakhedkar <kgangakhed...@gmail.com
>>>>>> > wrote:
>>>>>>
>>>>>>> Forgot to mention: the data size is not that big - it's barely 10GB
>>>>>>> in all.
>>>>>>>
>>>>>>> Kunal
>>>>>>>
>>>>>>> On 10 July 2015 at 13:29, Kunal Gangakhedkar <
>>>>>>> kgangakhed...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I have a 2 node setup on Azure (east us region) running Ubuntu
>>>>>>>> server 14.04LTS.
>>>>>>>> Both nodes have 8GB RAM.
>>>>>>>>
>>>>>>>> One of the nodes (seed node) died with OOM - so, I am trying to add
>>>>>>>> a replacement node with same configuration.
>>>>>>>>
>>>>>>>> The problem is this new node also keeps dying with OOM - I've
>>>>>>>> restarted the cassandra service like 8-10 times hoping that it would 
>>>>>>>> finish
>>>>>>>> the replication. But it didn't help.
>>>>>>>>
>>>>>>>> The one node that is still up is happily chugging along.
>>>>>>>> All nodes have similar configuration - with libjna installed.
>>>>>>>>
>>>>>>>> Cassandra is installed from datastax's debian repo - pkg: dsc21
>>>>>>>> version 2.1.7.
>>>>>>>> I started off with the default configuration - i.e. the default
>>>>>>>> cassandra-env.sh - which calculates the heap size automatically (1/4 * 
>>>>>>>> RAM
>>>>>>>> = 2GB)
>>>>>>>>
>>>>>>>> But, that didn't help. So, I then tried to increase the heap to 4GB
>>>>>>>> manually and restarted. It still keeps crashing.
>>>>>>>>
>>>>>>>> Any clue as to why it's happening?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Kunal
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Cassandra OOM on joining existing ring

Reply via email to