Re: Cassandra OOM on joining existing ring

Sebastian Estevez Mon, 13 Jul 2015 06:33:07 -0700

Are you on the azure premium storage?
http://www.datastax.com/2015/04/getting-started-with-azure-premium-storage-and-datastax-enterprise-dse


Secondary indexes are built for convenience not performance.
http://www.datastax.com/resources/data-modeling

What's your compaction strategy? Your nodes have to come up in order for
them to start compacting.
On Jul 13, 2015 1:11 AM, "Kunal Gangakhedkar" <kgangakhed...@gmail.com>
wrote:

> Hi,
>
> Looks like that is my primary problem - the sstable count for the
> daily_challenges column family is >5k. Azure had scheduled maintenance
> window on Sat. All the VMs got rebooted one by one - including the current
> cassandra one - and it's taking forever to bring cassandra back up online.
>
> Is there any way I can re-organize my existing data? so that I can bring
> down that count?
> I don't want to lose that data.
> If possible, can I do that while cassandra is down? As I mentioned, it's
> taking forever to get the service up - it's stuck in reading those 5k
> sstable (+ another 5k of corresponding secondary index) files. :(
> Oh, did I mention I'm new to cassandra?
>
> Thanks,
> Kunal
>
> Kunal
>
> On 11 July 2015 at 03:29, Sebastian Estevez <
> sebastian.este...@datastax.com> wrote:
>
>> #1
>>
>>> There is one table - daily_challenges - which shows compacted partition
>>> max bytes as ~460M and another one - daily_guest_logins - which shows
>>> compacted partition max bytes as ~36M.
>>
>>
>> 460 is high, I like to keep my partitions under 100mb when possible. I've
>> seen worse though. The fix is to add something else (maybe month or week or
>> something) into your partition key:
>>
>>  PRIMARY KEY ((segment_type, something_else), date, user_id, sess_id)
>>
>> #2 looks like your jam version is 3 per your env.sh so you're probably
>> okay to copy the env.sh over from the C* 3.0 link I shared once you
>> uncomment and tweak the MAX_HEAP. If there's something wrong your node
>> won't come up. tail your logs.
>>
>>
>>
>> All the best,
>>
>>
>> [image: datastax_logo.png] <http://www.datastax.com/>
>>
>> Sebastián Estévez
>>
>> Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com
>>
>> [image: linkedin.png] <https://www.linkedin.com/company/datastax> [image:
>> facebook.png] <https://www.facebook.com/datastax> [image: twitter.png]
>> <https://twitter.com/datastax> [image: g+.png]
>> <https://plus.google.com/+Datastax/about>
>> <http://feeds.feedburner.com/datastax>
>>
>> <http://cassandrasummit-datastax.com/>
>>
>> DataStax is the fastest, most scalable distributed database technology,
>> delivering Apache Cassandra to the world’s most innovative enterprises.
>> Datastax is built to be agile, always-on, and predictably scalable to any
>> size. With more than 500 customers in 45 countries, DataStax is the
>> database technology and transactional backbone of choice for the worlds
>> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>>
>> On Fri, Jul 10, 2015 at 2:44 PM, Kunal Gangakhedkar <
>> kgangakhed...@gmail.com> wrote:
>>
>>> And here is my cassandra-env.sh
>>> https://gist.github.com/kunalg/2c092cb2450c62be9a20
>>>
>>> Kunal
>>>
>>> On 11 July 2015 at 00:04, Kunal Gangakhedkar <kgangakhed...@gmail.com>
>>> wrote:
>>>
>>>> From jhat output, top 10 entries for "Instance Count for All Classes
>>>> (excluding platform)" shows:
>>>>
>>>> 2088223 instances of class org.apache.cassandra.db.BufferCell
>>>> 1983245 instances of class
>>>> org.apache.cassandra.db.composites.CompoundSparseCellName
>>>> 1885974 instances of class
>>>> org.apache.cassandra.db.composites.CompoundDenseCellName
>>>> 630000 instances of class
>>>> org.apache.cassandra.io.sstable.IndexHelper$IndexInfo
>>>> 503687 instances of class org.apache.cassandra.db.BufferDeletedCell
>>>> 378206 instances of class org.apache.cassandra.cql3.ColumnIdentifier
>>>> 101800 instances of class org.apache.cassandra.utils.concurrent.Ref
>>>> 101800 instances of class
>>>> org.apache.cassandra.utils.concurrent.Ref$State
>>>> 90704 instances of class
>>>> org.apache.cassandra.utils.concurrent.Ref$GlobalState
>>>> 71123 instances of class org.apache.cassandra.db.BufferDecoratedKey
>>>>
>>>> At the bottom of the page, it shows:
>>>> Total of 8739510 instances occupying 193607512 bytes.
>>>> JFYI.
>>>>
>>>> Kunal
>>>>
>>>> On 10 July 2015 at 23:49, Kunal Gangakhedkar <kgangakhed...@gmail.com>
>>>> wrote:
>>>>
>>>>> Thanks for quick reply.
>>>>>
>>>>> 1. I don't know what are the thresholds that I should look for. So, to
>>>>> save this back-and-forth, I'm attaching the cfstats output for the 
>>>>> keyspace.
>>>>>
>>>>> There is one table - daily_challenges - which shows compacted
>>>>> partition max bytes as ~460M and another one - daily_guest_logins - which
>>>>> shows compacted partition max bytes as ~36M.
>>>>>
>>>>> Can that be a problem?
>>>>> Here is the CQL schema for the daily_challenges column family:
>>>>>
>>>>> CREATE TABLE app_10001.daily_challenges (
>>>>>     segment_type text,
>>>>>     date timestamp,
>>>>>     user_id int,
>>>>>     sess_id text,
>>>>>     data text,
>>>>>     deleted boolean,
>>>>>     PRIMARY KEY (segment_type, date, user_id, sess_id)
>>>>> ) WITH CLUSTERING ORDER BY (date DESC, user_id ASC, sess_id ASC)
>>>>>     AND bloom_filter_fp_chance = 0.01
>>>>>     AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
>>>>>     AND comment = ''
>>>>>     AND compaction = {'min_threshold': '4', 'class':
>>>>> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
>>>>> 'max_threshold': '32'}
>>>>>     AND compression = {'sstable_compression':
>>>>> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>>>>>     AND dclocal_read_repair_chance = 0.1
>>>>>     AND default_time_to_live = 0
>>>>>     AND gc_grace_seconds = 864000
>>>>>     AND max_index_interval = 2048
>>>>>     AND memtable_flush_period_in_ms = 0
>>>>>     AND min_index_interval = 128
>>>>>     AND read_repair_chance = 0.0
>>>>>     AND speculative_retry = '99.0PERCENTILE';
>>>>>
>>>>> CREATE INDEX idx_deleted ON app_10001.daily_challenges (deleted);
>>>>>
>>>>>
>>>>> 2. I don't know - how do I check? As I mentioned, I just installed the
>>>>> dsc21 update from datastax's debian repo (ver 2.1.7).
>>>>>
>>>>> Really appreciate your help.
>>>>>
>>>>> Thanks,
>>>>> Kunal
>>>>>
>>>>> On 10 July 2015 at 23:33, Sebastian Estevez <
>>>>> sebastian.este...@datastax.com> wrote:
>>>>>
>>>>>> 1. You want to look at # of sstables in cfhistograms or in cfstats
>>>>>> look at:
>>>>>> Compacted partition maximum bytes
>>>>>> Maximum live cells per slice
>>>>>>
>>>>>> 2) No, here's the env.sh from 3.0 which should work with some tweaks:
>>>>>>
>>>>>> https://github.com/tobert/cassandra/blob/0f70469985d62aeadc20b41dc9cdc9d72a035c64/conf/cassandra-env.sh
>>>>>>
>>>>>> You'll at least have to modify the jamm version to what's in yours. I
>>>>>> think it's 2.5
>>>>>>
>>>>>>
>>>>>>
>>>>>> All the best,
>>>>>>
>>>>>>
>>>>>> [image: datastax_logo.png] <http://www.datastax.com/>
>>>>>>
>>>>>> Sebastián Estévez
>>>>>>
>>>>>> Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com
>>>>>>
>>>>>> [image: linkedin.png] <https://www.linkedin.com/company/datastax> [image:
>>>>>> facebook.png] <https://www.facebook.com/datastax> [image:
>>>>>> twitter.png] <https://twitter.com/datastax> [image: g+.png]
>>>>>> <https://plus.google.com/+Datastax/about>
>>>>>> <http://feeds.feedburner.com/datastax>
>>>>>>
>>>>>> <http://cassandrasummit-datastax.com/>
>>>>>>
>>>>>> DataStax is the fastest, most scalable distributed database
>>>>>> technology, delivering Apache Cassandra to the world’s most innovative
>>>>>> enterprises. Datastax is built to be agile, always-on, and predictably
>>>>>> scalable to any size. With more than 500 customers in 45 countries, 
>>>>>> DataStax
>>>>>> is the database technology and transactional backbone of choice for the
>>>>>> worlds most innovative companies such as Netflix, Adobe, Intuit, and 
>>>>>> eBay.
>>>>>>
>>>>>> On Fri, Jul 10, 2015 at 1:42 PM, Kunal Gangakhedkar <
>>>>>> kgangakhed...@gmail.com> wrote:
>>>>>>
>>>>>>> Thanks, Sebastian.
>>>>>>>
>>>>>>> Couple of questions (I'm really new to cassandra):
>>>>>>> 1. How do I interpret the output of 'nodetool cfstats' to figure out
>>>>>>> the issues? Any documentation pointer on that would be helpful.
>>>>>>>
>>>>>>> 2. I'm primarily a python/c developer - so, totally clueless about
>>>>>>> JVM environment. So, please bare with me as I would need a lot of
>>>>>>> hand-holding.
>>>>>>> Should I just copy+paste the settings you gave and try to restart
>>>>>>> the failing cassandra server?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Kunal
>>>>>>>
>>>>>>> On 10 July 2015 at 22:35, Sebastian Estevez <
>>>>>>> sebastian.este...@datastax.com> wrote:
>>>>>>>
>>>>>>>> #1 You need more information.
>>>>>>>>
>>>>>>>> a) Take a look at your .hprof file (memory heap from the OOM) with
>>>>>>>> an introspection tool like jhat or visualvm or java flight recorder 
>>>>>>>> and see
>>>>>>>> what is using up your RAM.
>>>>>>>>
>>>>>>>> b) How big are your large rows (use nodetool cfstats on each node).
>>>>>>>> If your data model is bad, you are going to have to re-design it no 
>>>>>>>> matter
>>>>>>>> what.
>>>>>>>>
>>>>>>>> #2 As a possible workaround try using the G1GC allocator with the
>>>>>>>> settings from c* 3.0 instead of CMS. I've seen lots of success with it
>>>>>>>> lately (tl;dr G1GC is much simpler than CMS and almost as good as a 
>>>>>>>> finely
>>>>>>>> tuned CMS). *Note:* Use it with the latest Java 8 from Oracle. Do
>>>>>>>> *not* set the newgen size for G1 sets it dynamically:
>>>>>>>>
>>>>>>>> # min and max heap sizes should be set to the same value to avoid
>>>>>>>>> # stop-the-world GC pauses during resize, and so that we can lock
>>>>>>>>> the
>>>>>>>>> # heap in memory on startup to prevent any of it from being swapped
>>>>>>>>> # out.
>>>>>>>>> JVM_OPTS="$JVM_OPTS -Xms${MAX_HEAP_SIZE}"
>>>>>>>>> JVM_OPTS="$JVM_OPTS -Xmx${MAX_HEAP_SIZE}"
>>>>>>>>>
>>>>>>>>> # Per-thread stack size.
>>>>>>>>> JVM_OPTS="$JVM_OPTS -Xss256k"
>>>>>>>>>
>>>>>>>>> # Use the Hotspot garbage-first collector.
>>>>>>>>> JVM_OPTS="$JVM_OPTS -XX:+UseG1GC"
>>>>>>>>>
>>>>>>>>> # Have the JVM do less remembered set work during STW, instead
>>>>>>>>> # preferring concurrent GC. Reduces p99.9 latency.
>>>>>>>>> JVM_OPTS="$JVM_OPTS -XX:G1RSetUpdatingPauseTimePercent=5"
>>>>>>>>>
>>>>>>>>> # The JVM maximum is 8 PGC threads and 1/4 of that for ConcGC.
>>>>>>>>> # Machines with > 10 cores may need additional threads.
>>>>>>>>> # Increase to <= full cores (do not count HT cores).
>>>>>>>>> #JVM_OPTS="$JVM_OPTS -XX:ParallelGCThreads=16"
>>>>>>>>> #JVM_OPTS="$JVM_OPTS -XX:ConcGCThreads=16"
>>>>>>>>>
>>>>>>>>> # Main G1GC tunable: lowering the pause target will lower
>>>>>>>>> throughput and vise versa.
>>>>>>>>> # 200ms is the JVM default and lowest viable setting
>>>>>>>>> # 1000ms increases throughput. Keep it smaller than the timeouts
>>>>>>>>> in cassandra.yaml.
>>>>>>>>> JVM_OPTS="$JVM_OPTS -XX:MaxGCPauseMillis=500"
>>>>>>>>> # Do reference processing in parallel GC.
>>>>>>>>> JVM_OPTS="$JVM_OPTS -XX:+ParallelRefProcEnabled"
>>>>>>>>>
>>>>>>>>> # This may help eliminate STW.
>>>>>>>>> # The default in Hotspot 8u40 is 40%.
>>>>>>>>> #JVM_OPTS="$JVM_OPTS -XX:InitiatingHeapOccupancyPercent=25"
>>>>>>>>>
>>>>>>>>> # For workloads that do large allocations, increasing the region
>>>>>>>>> # size may make things more efficient. Otherwise, let the JVM
>>>>>>>>> # set this automatically.
>>>>>>>>> #JVM_OPTS="$JVM_OPTS -XX:G1HeapRegionSize=32m"
>>>>>>>>>
>>>>>>>>> # Make sure all memory is faulted and zeroed on startup.
>>>>>>>>> # This helps prevent soft faults in containers and makes
>>>>>>>>> # transparent hugepage allocation more effective.
>>>>>>>>> JVM_OPTS="$JVM_OPTS -XX:+AlwaysPreTouch"
>>>>>>>>>
>>>>>>>>> # Biased locking does not benefit Cassandra.
>>>>>>>>> JVM_OPTS="$JVM_OPTS -XX:-UseBiasedLocking"
>>>>>>>>>
>>>>>>>>> # Larger interned string table, for gossip's benefit
>>>>>>>>> (CASSANDRA-6410)
>>>>>>>>> JVM_OPTS="$JVM_OPTS -XX:StringTableSize=1000003"
>>>>>>>>>
>>>>>>>>> # Enable thread-local allocation blocks and allow the JVM to
>>>>>>>>> automatically
>>>>>>>>> # resize them at runtime.
>>>>>>>>> JVM_OPTS="$JVM_OPTS -XX:+UseTLAB -XX:+ResizeTLAB"
>>>>>>>>>
>>>>>>>>> # http://www.evanjones.ca/jvm-mmap-pause.html
>>>>>>>>> JVM_OPTS="$JVM_OPTS -XX:+PerfDisableSharedMem"
>>>>>>>>
>>>>>>>>
>>>>>>>> All the best,
>>>>>>>>
>>>>>>>>
>>>>>>>> [image: datastax_logo.png] <http://www.datastax.com/>
>>>>>>>>
>>>>>>>> Sebastián Estévez
>>>>>>>>
>>>>>>>> Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com
>>>>>>>>
>>>>>>>> [image: linkedin.png] <https://www.linkedin.com/company/datastax> 
>>>>>>>> [image:
>>>>>>>> facebook.png] <https://www.facebook.com/datastax> [image:
>>>>>>>> twitter.png] <https://twitter.com/datastax> [image: g+.png]
>>>>>>>> <https://plus.google.com/+Datastax/about>
>>>>>>>> <http://feeds.feedburner.com/datastax>
>>>>>>>>
>>>>>>>> <http://cassandrasummit-datastax.com/>
>>>>>>>>
>>>>>>>> DataStax is the fastest, most scalable distributed database
>>>>>>>> technology, delivering Apache Cassandra to the world’s most innovative
>>>>>>>> enterprises. Datastax is built to be agile, always-on, and predictably
>>>>>>>> scalable to any size. With more than 500 customers in 45 countries, 
>>>>>>>> DataStax
>>>>>>>> is the database technology and transactional backbone of choice for the
>>>>>>>> worlds most innovative companies such as Netflix, Adobe, Intuit, and 
>>>>>>>> eBay.
>>>>>>>>
>>>>>>>> On Fri, Jul 10, 2015 at 12:55 PM, Kunal Gangakhedkar <
>>>>>>>> kgangakhed...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> I upgraded my instance from 8GB to a 14GB one.
>>>>>>>>> Allocated 8GB to jvm heap in cassandra-env.sh.
>>>>>>>>>
>>>>>>>>> And now, it crashes even faster with an OOM..
>>>>>>>>>
>>>>>>>>> Earlier, with 4GB heap, I could go upto ~90% replication
>>>>>>>>> completion (as reported by nodetool netstats); now, with 8GB heap, I 
>>>>>>>>> cannot
>>>>>>>>> even get there. I've already restarted cassandra service 4 times with 
>>>>>>>>> 8GB
>>>>>>>>> heap.
>>>>>>>>>
>>>>>>>>> No clue what's going on.. :(
>>>>>>>>>
>>>>>>>>> Kunal
>>>>>>>>>
>>>>>>>>> On 10 July 2015 at 17:45, Jack Krupansky <jack.krupan...@gmail.com
>>>>>>>>> > wrote:
>>>>>>>>>
>>>>>>>>>> You, and only you, are responsible for knowing your data and data
>>>>>>>>>> model.
>>>>>>>>>>
>>>>>>>>>> If columns per row or rows per partition can be large, then an
>>>>>>>>>> 8GB system is probably too small. But the real issue is that you 
>>>>>>>>>> need to
>>>>>>>>>> keep your partition size from getting too large.
>>>>>>>>>>
>>>>>>>>>> Generally, an 8GB system is okay, but only for reasonably-sized
>>>>>>>>>> partitions, like under 10MB.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> -- Jack Krupansky
>>>>>>>>>>
>>>>>>>>>> On Fri, Jul 10, 2015 at 8:05 AM, Kunal Gangakhedkar <
>>>>>>>>>> kgangakhed...@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> I'm new to cassandra
>>>>>>>>>>> How do I find those out? - mainly, the partition params that you
>>>>>>>>>>> asked for. Others, I think I can figure out.
>>>>>>>>>>>
>>>>>>>>>>> We don't have any large objects/blobs in the column values -
>>>>>>>>>>> it's all textual, date-time, numeric and uuid data.
>>>>>>>>>>>
>>>>>>>>>>> We use cassandra to primarily store segmentation data - with
>>>>>>>>>>> segment type as partition key. That is again divided into two 
>>>>>>>>>>> separate
>>>>>>>>>>> column families; but they have similar structure.
>>>>>>>>>>>
>>>>>>>>>>> Columns per row can be fairly large - each segment type as the
>>>>>>>>>>> row key and associated user ids and timestamp as column value.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Kunal
>>>>>>>>>>>
>>>>>>>>>>> On 10 July 2015 at 16:36, Jack Krupansky <
>>>>>>>>>>> jack.krupan...@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> What does your data and data model look like - partition size,
>>>>>>>>>>>> rows per partition, number of columns per row, any large 
>>>>>>>>>>>> values/blobs in
>>>>>>>>>>>> column values?
>>>>>>>>>>>>
>>>>>>>>>>>> You could run fine on an 8GB system, but only if your rows and
>>>>>>>>>>>> partitions are reasonably small. Any large partitions could blow 
>>>>>>>>>>>> you away.
>>>>>>>>>>>>
>>>>>>>>>>>> -- Jack Krupansky
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Jul 10, 2015 at 4:22 AM, Kunal Gangakhedkar <
>>>>>>>>>>>> kgangakhed...@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Attaching the stack dump captured from the last OOM.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Kunal
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 10 July 2015 at 13:32, Kunal Gangakhedkar <
>>>>>>>>>>>>> kgangakhed...@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Forgot to mention: the data size is not that big - it's
>>>>>>>>>>>>>> barely 10GB in all.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Kunal
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 10 July 2015 at 13:29, Kunal Gangakhedkar <
>>>>>>>>>>>>>> kgangakhed...@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I have a 2 node setup on Azure (east us region) running
>>>>>>>>>>>>>>> Ubuntu server 14.04LTS.
>>>>>>>>>>>>>>> Both nodes have 8GB RAM.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> One of the nodes (seed node) died with OOM - so, I am trying
>>>>>>>>>>>>>>> to add a replacement node with same configuration.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The problem is this new node also keeps dying with OOM -
>>>>>>>>>>>>>>> I've restarted the cassandra service like 8-10 times hoping 
>>>>>>>>>>>>>>> that it would
>>>>>>>>>>>>>>> finish the replication. But it didn't help.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The one node that is still up is happily chugging along.
>>>>>>>>>>>>>>> All nodes have similar configuration - with libjna installed.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Cassandra is installed from datastax's debian repo - pkg:
>>>>>>>>>>>>>>> dsc21 version 2.1.7.
>>>>>>>>>>>>>>> I started off with the default configuration - i.e. the
>>>>>>>>>>>>>>> default cassandra-env.sh - which calculates the heap size 
>>>>>>>>>>>>>>> automatically
>>>>>>>>>>>>>>> (1/4 * RAM = 2GB)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> But, that didn't help. So, I then tried to increase the heap
>>>>>>>>>>>>>>> to 4GB manually and restarted. It still keeps crashing.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Any clue as to why it's happening?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> Kunal
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Cassandra OOM on joining existing ring

Reply via email to