Are you on the azure premium storage? http://www.datastax.com/2015/04/getting-started-with-azure-premium-storage-and-datastax-enterprise-dse
Secondary indexes are built for convenience not performance. http://www.datastax.com/resources/data-modeling What's your compaction strategy? Your nodes have to come up in order for them to start compacting. On Jul 13, 2015 1:11 AM, "Kunal Gangakhedkar" <kgangakhed...@gmail.com> wrote: > Hi, > > Looks like that is my primary problem - the sstable count for the > daily_challenges column family is >5k. Azure had scheduled maintenance > window on Sat. All the VMs got rebooted one by one - including the current > cassandra one - and it's taking forever to bring cassandra back up online. > > Is there any way I can re-organize my existing data? so that I can bring > down that count? > I don't want to lose that data. > If possible, can I do that while cassandra is down? As I mentioned, it's > taking forever to get the service up - it's stuck in reading those 5k > sstable (+ another 5k of corresponding secondary index) files. :( > Oh, did I mention I'm new to cassandra? > > Thanks, > Kunal > > Kunal > > On 11 July 2015 at 03:29, Sebastian Estevez < > sebastian.este...@datastax.com> wrote: > >> #1 >> >>> There is one table - daily_challenges - which shows compacted partition >>> max bytes as ~460M and another one - daily_guest_logins - which shows >>> compacted partition max bytes as ~36M. >> >> >> 460 is high, I like to keep my partitions under 100mb when possible. I've >> seen worse though. The fix is to add something else (maybe month or week or >> something) into your partition key: >> >> PRIMARY KEY ((segment_type, something_else), date, user_id, sess_id) >> >> #2 looks like your jam version is 3 per your env.sh so you're probably >> okay to copy the env.sh over from the C* 3.0 link I shared once you >> uncomment and tweak the MAX_HEAP. If there's something wrong your node >> won't come up. tail your logs. >> >> >> >> All the best, >> >> >> [image: datastax_logo.png] <http://www.datastax.com/> >> >> Sebastián Estévez >> >> Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com >> >> [image: linkedin.png] <https://www.linkedin.com/company/datastax> [image: >> facebook.png] <https://www.facebook.com/datastax> [image: twitter.png] >> <https://twitter.com/datastax> [image: g+.png] >> <https://plus.google.com/+Datastax/about> >> <http://feeds.feedburner.com/datastax> >> >> <http://cassandrasummit-datastax.com/> >> >> DataStax is the fastest, most scalable distributed database technology, >> delivering Apache Cassandra to the world’s most innovative enterprises. >> Datastax is built to be agile, always-on, and predictably scalable to any >> size. With more than 500 customers in 45 countries, DataStax is the >> database technology and transactional backbone of choice for the worlds >> most innovative companies such as Netflix, Adobe, Intuit, and eBay. >> >> On Fri, Jul 10, 2015 at 2:44 PM, Kunal Gangakhedkar < >> kgangakhed...@gmail.com> wrote: >> >>> And here is my cassandra-env.sh >>> https://gist.github.com/kunalg/2c092cb2450c62be9a20 >>> >>> Kunal >>> >>> On 11 July 2015 at 00:04, Kunal Gangakhedkar <kgangakhed...@gmail.com> >>> wrote: >>> >>>> From jhat output, top 10 entries for "Instance Count for All Classes >>>> (excluding platform)" shows: >>>> >>>> 2088223 instances of class org.apache.cassandra.db.BufferCell >>>> 1983245 instances of class >>>> org.apache.cassandra.db.composites.CompoundSparseCellName >>>> 1885974 instances of class >>>> org.apache.cassandra.db.composites.CompoundDenseCellName >>>> 630000 instances of class >>>> org.apache.cassandra.io.sstable.IndexHelper$IndexInfo >>>> 503687 instances of class org.apache.cassandra.db.BufferDeletedCell >>>> 378206 instances of class org.apache.cassandra.cql3.ColumnIdentifier >>>> 101800 instances of class org.apache.cassandra.utils.concurrent.Ref >>>> 101800 instances of class >>>> org.apache.cassandra.utils.concurrent.Ref$State >>>> 90704 instances of class >>>> org.apache.cassandra.utils.concurrent.Ref$GlobalState >>>> 71123 instances of class org.apache.cassandra.db.BufferDecoratedKey >>>> >>>> At the bottom of the page, it shows: >>>> Total of 8739510 instances occupying 193607512 bytes. >>>> JFYI. >>>> >>>> Kunal >>>> >>>> On 10 July 2015 at 23:49, Kunal Gangakhedkar <kgangakhed...@gmail.com> >>>> wrote: >>>> >>>>> Thanks for quick reply. >>>>> >>>>> 1. I don't know what are the thresholds that I should look for. So, to >>>>> save this back-and-forth, I'm attaching the cfstats output for the >>>>> keyspace. >>>>> >>>>> There is one table - daily_challenges - which shows compacted >>>>> partition max bytes as ~460M and another one - daily_guest_logins - which >>>>> shows compacted partition max bytes as ~36M. >>>>> >>>>> Can that be a problem? >>>>> Here is the CQL schema for the daily_challenges column family: >>>>> >>>>> CREATE TABLE app_10001.daily_challenges ( >>>>> segment_type text, >>>>> date timestamp, >>>>> user_id int, >>>>> sess_id text, >>>>> data text, >>>>> deleted boolean, >>>>> PRIMARY KEY (segment_type, date, user_id, sess_id) >>>>> ) WITH CLUSTERING ORDER BY (date DESC, user_id ASC, sess_id ASC) >>>>> AND bloom_filter_fp_chance = 0.01 >>>>> AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}' >>>>> AND comment = '' >>>>> AND compaction = {'min_threshold': '4', 'class': >>>>> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', >>>>> 'max_threshold': '32'} >>>>> AND compression = {'sstable_compression': >>>>> 'org.apache.cassandra.io.compress.LZ4Compressor'} >>>>> AND dclocal_read_repair_chance = 0.1 >>>>> AND default_time_to_live = 0 >>>>> AND gc_grace_seconds = 864000 >>>>> AND max_index_interval = 2048 >>>>> AND memtable_flush_period_in_ms = 0 >>>>> AND min_index_interval = 128 >>>>> AND read_repair_chance = 0.0 >>>>> AND speculative_retry = '99.0PERCENTILE'; >>>>> >>>>> CREATE INDEX idx_deleted ON app_10001.daily_challenges (deleted); >>>>> >>>>> >>>>> 2. I don't know - how do I check? As I mentioned, I just installed the >>>>> dsc21 update from datastax's debian repo (ver 2.1.7). >>>>> >>>>> Really appreciate your help. >>>>> >>>>> Thanks, >>>>> Kunal >>>>> >>>>> On 10 July 2015 at 23:33, Sebastian Estevez < >>>>> sebastian.este...@datastax.com> wrote: >>>>> >>>>>> 1. You want to look at # of sstables in cfhistograms or in cfstats >>>>>> look at: >>>>>> Compacted partition maximum bytes >>>>>> Maximum live cells per slice >>>>>> >>>>>> 2) No, here's the env.sh from 3.0 which should work with some tweaks: >>>>>> >>>>>> https://github.com/tobert/cassandra/blob/0f70469985d62aeadc20b41dc9cdc9d72a035c64/conf/cassandra-env.sh >>>>>> >>>>>> You'll at least have to modify the jamm version to what's in yours. I >>>>>> think it's 2.5 >>>>>> >>>>>> >>>>>> >>>>>> All the best, >>>>>> >>>>>> >>>>>> [image: datastax_logo.png] <http://www.datastax.com/> >>>>>> >>>>>> Sebastián Estévez >>>>>> >>>>>> Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com >>>>>> >>>>>> [image: linkedin.png] <https://www.linkedin.com/company/datastax> [image: >>>>>> facebook.png] <https://www.facebook.com/datastax> [image: >>>>>> twitter.png] <https://twitter.com/datastax> [image: g+.png] >>>>>> <https://plus.google.com/+Datastax/about> >>>>>> <http://feeds.feedburner.com/datastax> >>>>>> >>>>>> <http://cassandrasummit-datastax.com/> >>>>>> >>>>>> DataStax is the fastest, most scalable distributed database >>>>>> technology, delivering Apache Cassandra to the world’s most innovative >>>>>> enterprises. Datastax is built to be agile, always-on, and predictably >>>>>> scalable to any size. With more than 500 customers in 45 countries, >>>>>> DataStax >>>>>> is the database technology and transactional backbone of choice for the >>>>>> worlds most innovative companies such as Netflix, Adobe, Intuit, and >>>>>> eBay. >>>>>> >>>>>> On Fri, Jul 10, 2015 at 1:42 PM, Kunal Gangakhedkar < >>>>>> kgangakhed...@gmail.com> wrote: >>>>>> >>>>>>> Thanks, Sebastian. >>>>>>> >>>>>>> Couple of questions (I'm really new to cassandra): >>>>>>> 1. How do I interpret the output of 'nodetool cfstats' to figure out >>>>>>> the issues? Any documentation pointer on that would be helpful. >>>>>>> >>>>>>> 2. I'm primarily a python/c developer - so, totally clueless about >>>>>>> JVM environment. So, please bare with me as I would need a lot of >>>>>>> hand-holding. >>>>>>> Should I just copy+paste the settings you gave and try to restart >>>>>>> the failing cassandra server? >>>>>>> >>>>>>> Thanks, >>>>>>> Kunal >>>>>>> >>>>>>> On 10 July 2015 at 22:35, Sebastian Estevez < >>>>>>> sebastian.este...@datastax.com> wrote: >>>>>>> >>>>>>>> #1 You need more information. >>>>>>>> >>>>>>>> a) Take a look at your .hprof file (memory heap from the OOM) with >>>>>>>> an introspection tool like jhat or visualvm or java flight recorder >>>>>>>> and see >>>>>>>> what is using up your RAM. >>>>>>>> >>>>>>>> b) How big are your large rows (use nodetool cfstats on each node). >>>>>>>> If your data model is bad, you are going to have to re-design it no >>>>>>>> matter >>>>>>>> what. >>>>>>>> >>>>>>>> #2 As a possible workaround try using the G1GC allocator with the >>>>>>>> settings from c* 3.0 instead of CMS. I've seen lots of success with it >>>>>>>> lately (tl;dr G1GC is much simpler than CMS and almost as good as a >>>>>>>> finely >>>>>>>> tuned CMS). *Note:* Use it with the latest Java 8 from Oracle. Do >>>>>>>> *not* set the newgen size for G1 sets it dynamically: >>>>>>>> >>>>>>>> # min and max heap sizes should be set to the same value to avoid >>>>>>>>> # stop-the-world GC pauses during resize, and so that we can lock >>>>>>>>> the >>>>>>>>> # heap in memory on startup to prevent any of it from being swapped >>>>>>>>> # out. >>>>>>>>> JVM_OPTS="$JVM_OPTS -Xms${MAX_HEAP_SIZE}" >>>>>>>>> JVM_OPTS="$JVM_OPTS -Xmx${MAX_HEAP_SIZE}" >>>>>>>>> >>>>>>>>> # Per-thread stack size. >>>>>>>>> JVM_OPTS="$JVM_OPTS -Xss256k" >>>>>>>>> >>>>>>>>> # Use the Hotspot garbage-first collector. >>>>>>>>> JVM_OPTS="$JVM_OPTS -XX:+UseG1GC" >>>>>>>>> >>>>>>>>> # Have the JVM do less remembered set work during STW, instead >>>>>>>>> # preferring concurrent GC. Reduces p99.9 latency. >>>>>>>>> JVM_OPTS="$JVM_OPTS -XX:G1RSetUpdatingPauseTimePercent=5" >>>>>>>>> >>>>>>>>> # The JVM maximum is 8 PGC threads and 1/4 of that for ConcGC. >>>>>>>>> # Machines with > 10 cores may need additional threads. >>>>>>>>> # Increase to <= full cores (do not count HT cores). >>>>>>>>> #JVM_OPTS="$JVM_OPTS -XX:ParallelGCThreads=16" >>>>>>>>> #JVM_OPTS="$JVM_OPTS -XX:ConcGCThreads=16" >>>>>>>>> >>>>>>>>> # Main G1GC tunable: lowering the pause target will lower >>>>>>>>> throughput and vise versa. >>>>>>>>> # 200ms is the JVM default and lowest viable setting >>>>>>>>> # 1000ms increases throughput. Keep it smaller than the timeouts >>>>>>>>> in cassandra.yaml. >>>>>>>>> JVM_OPTS="$JVM_OPTS -XX:MaxGCPauseMillis=500" >>>>>>>>> # Do reference processing in parallel GC. >>>>>>>>> JVM_OPTS="$JVM_OPTS -XX:+ParallelRefProcEnabled" >>>>>>>>> >>>>>>>>> # This may help eliminate STW. >>>>>>>>> # The default in Hotspot 8u40 is 40%. >>>>>>>>> #JVM_OPTS="$JVM_OPTS -XX:InitiatingHeapOccupancyPercent=25" >>>>>>>>> >>>>>>>>> # For workloads that do large allocations, increasing the region >>>>>>>>> # size may make things more efficient. Otherwise, let the JVM >>>>>>>>> # set this automatically. >>>>>>>>> #JVM_OPTS="$JVM_OPTS -XX:G1HeapRegionSize=32m" >>>>>>>>> >>>>>>>>> # Make sure all memory is faulted and zeroed on startup. >>>>>>>>> # This helps prevent soft faults in containers and makes >>>>>>>>> # transparent hugepage allocation more effective. >>>>>>>>> JVM_OPTS="$JVM_OPTS -XX:+AlwaysPreTouch" >>>>>>>>> >>>>>>>>> # Biased locking does not benefit Cassandra. >>>>>>>>> JVM_OPTS="$JVM_OPTS -XX:-UseBiasedLocking" >>>>>>>>> >>>>>>>>> # Larger interned string table, for gossip's benefit >>>>>>>>> (CASSANDRA-6410) >>>>>>>>> JVM_OPTS="$JVM_OPTS -XX:StringTableSize=1000003" >>>>>>>>> >>>>>>>>> # Enable thread-local allocation blocks and allow the JVM to >>>>>>>>> automatically >>>>>>>>> # resize them at runtime. >>>>>>>>> JVM_OPTS="$JVM_OPTS -XX:+UseTLAB -XX:+ResizeTLAB" >>>>>>>>> >>>>>>>>> # http://www.evanjones.ca/jvm-mmap-pause.html >>>>>>>>> JVM_OPTS="$JVM_OPTS -XX:+PerfDisableSharedMem" >>>>>>>> >>>>>>>> >>>>>>>> All the best, >>>>>>>> >>>>>>>> >>>>>>>> [image: datastax_logo.png] <http://www.datastax.com/> >>>>>>>> >>>>>>>> Sebastián Estévez >>>>>>>> >>>>>>>> Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com >>>>>>>> >>>>>>>> [image: linkedin.png] <https://www.linkedin.com/company/datastax> >>>>>>>> [image: >>>>>>>> facebook.png] <https://www.facebook.com/datastax> [image: >>>>>>>> twitter.png] <https://twitter.com/datastax> [image: g+.png] >>>>>>>> <https://plus.google.com/+Datastax/about> >>>>>>>> <http://feeds.feedburner.com/datastax> >>>>>>>> >>>>>>>> <http://cassandrasummit-datastax.com/> >>>>>>>> >>>>>>>> DataStax is the fastest, most scalable distributed database >>>>>>>> technology, delivering Apache Cassandra to the world’s most innovative >>>>>>>> enterprises. Datastax is built to be agile, always-on, and predictably >>>>>>>> scalable to any size. With more than 500 customers in 45 countries, >>>>>>>> DataStax >>>>>>>> is the database technology and transactional backbone of choice for the >>>>>>>> worlds most innovative companies such as Netflix, Adobe, Intuit, and >>>>>>>> eBay. >>>>>>>> >>>>>>>> On Fri, Jul 10, 2015 at 12:55 PM, Kunal Gangakhedkar < >>>>>>>> kgangakhed...@gmail.com> wrote: >>>>>>>> >>>>>>>>> I upgraded my instance from 8GB to a 14GB one. >>>>>>>>> Allocated 8GB to jvm heap in cassandra-env.sh. >>>>>>>>> >>>>>>>>> And now, it crashes even faster with an OOM.. >>>>>>>>> >>>>>>>>> Earlier, with 4GB heap, I could go upto ~90% replication >>>>>>>>> completion (as reported by nodetool netstats); now, with 8GB heap, I >>>>>>>>> cannot >>>>>>>>> even get there. I've already restarted cassandra service 4 times with >>>>>>>>> 8GB >>>>>>>>> heap. >>>>>>>>> >>>>>>>>> No clue what's going on.. :( >>>>>>>>> >>>>>>>>> Kunal >>>>>>>>> >>>>>>>>> On 10 July 2015 at 17:45, Jack Krupansky <jack.krupan...@gmail.com >>>>>>>>> > wrote: >>>>>>>>> >>>>>>>>>> You, and only you, are responsible for knowing your data and data >>>>>>>>>> model. >>>>>>>>>> >>>>>>>>>> If columns per row or rows per partition can be large, then an >>>>>>>>>> 8GB system is probably too small. But the real issue is that you >>>>>>>>>> need to >>>>>>>>>> keep your partition size from getting too large. >>>>>>>>>> >>>>>>>>>> Generally, an 8GB system is okay, but only for reasonably-sized >>>>>>>>>> partitions, like under 10MB. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- Jack Krupansky >>>>>>>>>> >>>>>>>>>> On Fri, Jul 10, 2015 at 8:05 AM, Kunal Gangakhedkar < >>>>>>>>>> kgangakhed...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> I'm new to cassandra >>>>>>>>>>> How do I find those out? - mainly, the partition params that you >>>>>>>>>>> asked for. Others, I think I can figure out. >>>>>>>>>>> >>>>>>>>>>> We don't have any large objects/blobs in the column values - >>>>>>>>>>> it's all textual, date-time, numeric and uuid data. >>>>>>>>>>> >>>>>>>>>>> We use cassandra to primarily store segmentation data - with >>>>>>>>>>> segment type as partition key. That is again divided into two >>>>>>>>>>> separate >>>>>>>>>>> column families; but they have similar structure. >>>>>>>>>>> >>>>>>>>>>> Columns per row can be fairly large - each segment type as the >>>>>>>>>>> row key and associated user ids and timestamp as column value. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Kunal >>>>>>>>>>> >>>>>>>>>>> On 10 July 2015 at 16:36, Jack Krupansky < >>>>>>>>>>> jack.krupan...@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> What does your data and data model look like - partition size, >>>>>>>>>>>> rows per partition, number of columns per row, any large >>>>>>>>>>>> values/blobs in >>>>>>>>>>>> column values? >>>>>>>>>>>> >>>>>>>>>>>> You could run fine on an 8GB system, but only if your rows and >>>>>>>>>>>> partitions are reasonably small. Any large partitions could blow >>>>>>>>>>>> you away. >>>>>>>>>>>> >>>>>>>>>>>> -- Jack Krupansky >>>>>>>>>>>> >>>>>>>>>>>> On Fri, Jul 10, 2015 at 4:22 AM, Kunal Gangakhedkar < >>>>>>>>>>>> kgangakhed...@gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Attaching the stack dump captured from the last OOM. >>>>>>>>>>>>> >>>>>>>>>>>>> Kunal >>>>>>>>>>>>> >>>>>>>>>>>>> On 10 July 2015 at 13:32, Kunal Gangakhedkar < >>>>>>>>>>>>> kgangakhed...@gmail.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Forgot to mention: the data size is not that big - it's >>>>>>>>>>>>>> barely 10GB in all. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Kunal >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 10 July 2015 at 13:29, Kunal Gangakhedkar < >>>>>>>>>>>>>> kgangakhed...@gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I have a 2 node setup on Azure (east us region) running >>>>>>>>>>>>>>> Ubuntu server 14.04LTS. >>>>>>>>>>>>>>> Both nodes have 8GB RAM. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> One of the nodes (seed node) died with OOM - so, I am trying >>>>>>>>>>>>>>> to add a replacement node with same configuration. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The problem is this new node also keeps dying with OOM - >>>>>>>>>>>>>>> I've restarted the cassandra service like 8-10 times hoping >>>>>>>>>>>>>>> that it would >>>>>>>>>>>>>>> finish the replication. But it didn't help. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The one node that is still up is happily chugging along. >>>>>>>>>>>>>>> All nodes have similar configuration - with libjna installed. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Cassandra is installed from datastax's debian repo - pkg: >>>>>>>>>>>>>>> dsc21 version 2.1.7. >>>>>>>>>>>>>>> I started off with the default configuration - i.e. the >>>>>>>>>>>>>>> default cassandra-env.sh - which calculates the heap size >>>>>>>>>>>>>>> automatically >>>>>>>>>>>>>>> (1/4 * RAM = 2GB) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> But, that didn't help. So, I then tried to increase the heap >>>>>>>>>>>>>>> to 4GB manually and restarted. It still keeps crashing. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Any clue as to why it's happening? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>> Kunal >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >