Thanks, Sebastian. Couple of questions (I'm really new to cassandra): 1. How do I interpret the output of 'nodetool cfstats' to figure out the issues? Any documentation pointer on that would be helpful.
2. I'm primarily a python/c developer - so, totally clueless about JVM environment. So, please bare with me as I would need a lot of hand-holding. Should I just copy+paste the settings you gave and try to restart the failing cassandra server? Thanks, Kunal On 10 July 2015 at 22:35, Sebastian Estevez <sebastian.este...@datastax.com> wrote: > #1 You need more information. > > a) Take a look at your .hprof file (memory heap from the OOM) with an > introspection tool like jhat or visualvm or java flight recorder and see > what is using up your RAM. > > b) How big are your large rows (use nodetool cfstats on each node). If > your data model is bad, you are going to have to re-design it no matter > what. > > #2 As a possible workaround try using the G1GC allocator with the settings > from c* 3.0 instead of CMS. I've seen lots of success with it lately (tl;dr > G1GC is much simpler than CMS and almost as good as a finely tuned CMS). > *Note:* Use it with the latest Java 8 from Oracle. Do *not* set the > newgen size for G1 sets it dynamically: > > # min and max heap sizes should be set to the same value to avoid >> # stop-the-world GC pauses during resize, and so that we can lock the >> # heap in memory on startup to prevent any of it from being swapped >> # out. >> JVM_OPTS="$JVM_OPTS -Xms${MAX_HEAP_SIZE}" >> JVM_OPTS="$JVM_OPTS -Xmx${MAX_HEAP_SIZE}" >> >> # Per-thread stack size. >> JVM_OPTS="$JVM_OPTS -Xss256k" >> >> # Use the Hotspot garbage-first collector. >> JVM_OPTS="$JVM_OPTS -XX:+UseG1GC" >> >> # Have the JVM do less remembered set work during STW, instead >> # preferring concurrent GC. Reduces p99.9 latency. >> JVM_OPTS="$JVM_OPTS -XX:G1RSetUpdatingPauseTimePercent=5" >> >> # The JVM maximum is 8 PGC threads and 1/4 of that for ConcGC. >> # Machines with > 10 cores may need additional threads. >> # Increase to <= full cores (do not count HT cores). >> #JVM_OPTS="$JVM_OPTS -XX:ParallelGCThreads=16" >> #JVM_OPTS="$JVM_OPTS -XX:ConcGCThreads=16" >> >> # Main G1GC tunable: lowering the pause target will lower throughput and >> vise versa. >> # 200ms is the JVM default and lowest viable setting >> # 1000ms increases throughput. Keep it smaller than the timeouts in >> cassandra.yaml. >> JVM_OPTS="$JVM_OPTS -XX:MaxGCPauseMillis=500" >> # Do reference processing in parallel GC. >> JVM_OPTS="$JVM_OPTS -XX:+ParallelRefProcEnabled" >> >> # This may help eliminate STW. >> # The default in Hotspot 8u40 is 40%. >> #JVM_OPTS="$JVM_OPTS -XX:InitiatingHeapOccupancyPercent=25" >> >> # For workloads that do large allocations, increasing the region >> # size may make things more efficient. Otherwise, let the JVM >> # set this automatically. >> #JVM_OPTS="$JVM_OPTS -XX:G1HeapRegionSize=32m" >> >> # Make sure all memory is faulted and zeroed on startup. >> # This helps prevent soft faults in containers and makes >> # transparent hugepage allocation more effective. >> JVM_OPTS="$JVM_OPTS -XX:+AlwaysPreTouch" >> >> # Biased locking does not benefit Cassandra. >> JVM_OPTS="$JVM_OPTS -XX:-UseBiasedLocking" >> >> # Larger interned string table, for gossip's benefit (CASSANDRA-6410) >> JVM_OPTS="$JVM_OPTS -XX:StringTableSize=1000003" >> >> # Enable thread-local allocation blocks and allow the JVM to automatically >> # resize them at runtime. >> JVM_OPTS="$JVM_OPTS -XX:+UseTLAB -XX:+ResizeTLAB" >> >> # http://www.evanjones.ca/jvm-mmap-pause.html >> JVM_OPTS="$JVM_OPTS -XX:+PerfDisableSharedMem" > > > All the best, > > > [image: datastax_logo.png] <http://www.datastax.com/> > > Sebastián Estévez > > Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com > > [image: linkedin.png] <https://www.linkedin.com/company/datastax> [image: > facebook.png] <https://www.facebook.com/datastax> [image: twitter.png] > <https://twitter.com/datastax> [image: g+.png] > <https://plus.google.com/+Datastax/about> > <http://feeds.feedburner.com/datastax> > > <http://cassandrasummit-datastax.com/> > > DataStax is the fastest, most scalable distributed database technology, > delivering Apache Cassandra to the world’s most innovative enterprises. > Datastax is built to be agile, always-on, and predictably scalable to any > size. With more than 500 customers in 45 countries, DataStax is the > database technology and transactional backbone of choice for the worlds > most innovative companies such as Netflix, Adobe, Intuit, and eBay. > > On Fri, Jul 10, 2015 at 12:55 PM, Kunal Gangakhedkar < > kgangakhed...@gmail.com> wrote: > >> I upgraded my instance from 8GB to a 14GB one. >> Allocated 8GB to jvm heap in cassandra-env.sh. >> >> And now, it crashes even faster with an OOM.. >> >> Earlier, with 4GB heap, I could go upto ~90% replication completion (as >> reported by nodetool netstats); now, with 8GB heap, I cannot even get >> there. I've already restarted cassandra service 4 times with 8GB heap. >> >> No clue what's going on.. :( >> >> Kunal >> >> On 10 July 2015 at 17:45, Jack Krupansky <jack.krupan...@gmail.com> >> wrote: >> >>> You, and only you, are responsible for knowing your data and data model. >>> >>> If columns per row or rows per partition can be large, then an 8GB >>> system is probably too small. But the real issue is that you need to keep >>> your partition size from getting too large. >>> >>> Generally, an 8GB system is okay, but only for reasonably-sized >>> partitions, like under 10MB. >>> >>> >>> -- Jack Krupansky >>> >>> On Fri, Jul 10, 2015 at 8:05 AM, Kunal Gangakhedkar < >>> kgangakhed...@gmail.com> wrote: >>> >>>> I'm new to cassandra >>>> How do I find those out? - mainly, the partition params that you asked >>>> for. Others, I think I can figure out. >>>> >>>> We don't have any large objects/blobs in the column values - it's all >>>> textual, date-time, numeric and uuid data. >>>> >>>> We use cassandra to primarily store segmentation data - with segment >>>> type as partition key. That is again divided into two separate column >>>> families; but they have similar structure. >>>> >>>> Columns per row can be fairly large - each segment type as the row key >>>> and associated user ids and timestamp as column value. >>>> >>>> Thanks, >>>> Kunal >>>> >>>> On 10 July 2015 at 16:36, Jack Krupansky <jack.krupan...@gmail.com> >>>> wrote: >>>> >>>>> What does your data and data model look like - partition size, rows >>>>> per partition, number of columns per row, any large values/blobs in column >>>>> values? >>>>> >>>>> You could run fine on an 8GB system, but only if your rows and >>>>> partitions are reasonably small. Any large partitions could blow you away. >>>>> >>>>> -- Jack Krupansky >>>>> >>>>> On Fri, Jul 10, 2015 at 4:22 AM, Kunal Gangakhedkar < >>>>> kgangakhed...@gmail.com> wrote: >>>>> >>>>>> Attaching the stack dump captured from the last OOM. >>>>>> >>>>>> Kunal >>>>>> >>>>>> On 10 July 2015 at 13:32, Kunal Gangakhedkar <kgangakhed...@gmail.com >>>>>> > wrote: >>>>>> >>>>>>> Forgot to mention: the data size is not that big - it's barely 10GB >>>>>>> in all. >>>>>>> >>>>>>> Kunal >>>>>>> >>>>>>> On 10 July 2015 at 13:29, Kunal Gangakhedkar < >>>>>>> kgangakhed...@gmail.com> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I have a 2 node setup on Azure (east us region) running Ubuntu >>>>>>>> server 14.04LTS. >>>>>>>> Both nodes have 8GB RAM. >>>>>>>> >>>>>>>> One of the nodes (seed node) died with OOM - so, I am trying to add >>>>>>>> a replacement node with same configuration. >>>>>>>> >>>>>>>> The problem is this new node also keeps dying with OOM - I've >>>>>>>> restarted the cassandra service like 8-10 times hoping that it would >>>>>>>> finish >>>>>>>> the replication. But it didn't help. >>>>>>>> >>>>>>>> The one node that is still up is happily chugging along. >>>>>>>> All nodes have similar configuration - with libjna installed. >>>>>>>> >>>>>>>> Cassandra is installed from datastax's debian repo - pkg: dsc21 >>>>>>>> version 2.1.7. >>>>>>>> I started off with the default configuration - i.e. the default >>>>>>>> cassandra-env.sh - which calculates the heap size automatically (1/4 * >>>>>>>> RAM >>>>>>>> = 2GB) >>>>>>>> >>>>>>>> But, that didn't help. So, I then tried to increase the heap to 4GB >>>>>>>> manually and restarted. It still keeps crashing. >>>>>>>> >>>>>>>> Any clue as to why it's happening? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Kunal >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >