```jvm setting -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -XX:+HeapDumpOnOutOfMemoryError -Xss256k -XX:StringTableSize=1000003 -XX:+AlwaysPreTouch -XX:-UseBiasedLocking -XX:+UseTLAB -XX:+ResizeTLAB -XX:+UseNUMA -XX:+PerfDisableSharedMem -Djava.net.preferIPv4Stack=true -XX:+UseG1GC -XX:G1RSetUpdatingPauseTimePercent=5 -XX:MaxGCPauseMillis=500 -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -XX:+PrintPromotionFailure -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=10M
Total memory free total used free shared buffers cached Mem: 16434004 16125340 308664 60 172872 5565184 -/+ buffers/cache: 10387284 6046720 Swap: 0 0 0 Heap settings in cassandra-env.sh MAX_HEAP_SIZE="8192M" HEAP_NEWSIZE="800M" ``` On Sat, Feb 23, 2019, 10:15 PM Rahul Reddy <rahulreddy1...@gmail.com> wrote: > Thanks Jeff, > > Since low writes and high reads most of the time data in memtables only. > When I noticed intially issue no stables on disk everything in memtable > only. > > On Sat, Feb 23, 2019, 10:01 PM Jeff Jirsa <jji...@gmail.com> wrote: > >> Also given your short ttl and low write rate, you may want to think about >> how you can keep more in memory - this may mean larger memtable and high >> flush thresholds (reading from the memtable), or perhaps the partition >> cache (if you are likely to read the same key multiple times). You’ll also >> probably win some with basic perf and GC tuning, but can’t really do that >> via email. Cassandra-8150 has some pointers. >> >> -- >> Jeff Jirsa >> >> >> On Feb 23, 2019, at 6:52 PM, Jeff Jirsa <jji...@gmail.com> wrote: >> >> You’ll only ever have one tombstone per read, so your load is based on >> normal read rate not tombstones. The metric isn’t wrong, but it’s not >> indicative of a problem here given your data model. >> >> You’re using STCS do you may be reading from more than one sstable if you >> update column2 for a given column1, otherwise you’re probably just seeing >> normal read load. Consider dropping your compression chunk size a bit >> (given the sizes in your cfstats I’d probably go to 4K instead of 64k), and >> maybe consider LCS or TWCS instead of STCS (Which is appropriate depends on >> a lot of factors, but STCS is probably causing a fair bit of unnecessary >> compactions and probably is very slow to expire data). >> >> -- >> Jeff Jirsa >> >> >> On Feb 23, 2019, at 6:31 PM, Rahul Reddy <rahulreddy1...@gmail.com> >> wrote: >> >> Do you see anything wrong with this metric. >> >> metric to scan tombstones >> >> increase(cassandra_Table_TombstoneScannedHistogram{keyspace="mykeyspace",Table="tablename",function="Count"}[5m]) >> >> And sametime CPU Spike to 50% whenever I see high tombstone alert. >> >> On Sat, Feb 23, 2019, 9:25 PM Jeff Jirsa <jji...@gmail.com> wrote: >> >>> Your schema is such that you’ll never read more than one tombstone per >>> select (unless you’re also doing range reads / table scans that you didn’t >>> mention) - I’m not quite sure what you’re alerting on, but you’re not going >>> to have tombstone problems with that table / that select. >>> >>> -- >>> Jeff Jirsa >>> >>> >>> On Feb 23, 2019, at 5:55 PM, Rahul Reddy <rahulreddy1...@gmail.com> >>> wrote: >>> >>> Changing gcgs didn't help >>> >>> CREATE KEYSPACE ksname WITH replication = {'class': >>> 'NetworkTopologyStrategy', 'dc1': '3', 'dc2': '3'} AND durable_writes = >>> true; >>> >>> >>> ```CREATE TABLE keyspace."table" ( >>> "column1" text PRIMARY KEY, >>> "column2" text >>> ) WITH bloom_filter_fp_chance = 0.01 >>> AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'} >>> AND comment = '' >>> AND compaction = {'class': >>> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', >>> 'max_threshold': '32', 'min_threshold': '4'} >>> AND compression = {'chunk_length_in_kb': '64', 'class': >>> 'org.apache.cassandra.io.compress.LZ4Compressor'} >>> AND crc_check_chance = 1.0 >>> AND dclocal_read_repair_chance = 0.1 >>> AND default_time_to_live = 18000 >>> AND gc_grace_seconds = 60 >>> AND max_index_interval = 2048 >>> AND memtable_flush_period_in_ms = 0 >>> AND min_index_interval = 128 >>> AND read_repair_chance = 0.0 >>> AND speculative_retry = '99PERCENTILE'; >>> >>> flushed table and took tsstabledump >>> grep -i '"expired" : true' SSTables.txt|wc -l >>> 16439 >>> grep -i '"expired" : false' SSTables.txt |wc -l >>> 2657 >>> >>> ttl is 4 hours. >>> >>> INSERT INTO keyspace."TABLE_NAME" ("column1", "column2") VALUES (?, ?) >>> USING TTL(4hours) ?'; >>> SELECT * FROM keyspace."TABLE_NAME" WHERE "column1" = ?'; >>> >>> metric to scan tombstones >>> >>> increase(cassandra_Table_TombstoneScannedHistogram{keyspace="mykeyspace",Table="tablename",function="Count"}[5m]) >>> >>> during peak hours. we only have couple of hundred inserts and 5-8k >>> reads/s per node. >>> ``` >>> >>> ```tablestats >>> Read Count: 605231874 >>> Read Latency: 0.021268529760215503 ms. >>> Write Count: 2763352 >>> Write Latency: 0.027924007871599422 ms. >>> Pending Flushes: 0 >>> Table: name >>> SSTable count: 1 >>> Space used (live): 1413203 >>> Space used (total): 1413203 >>> Space used by snapshots (total): 0 >>> Off heap memory used (total): 28813 >>> SSTable Compression Ratio: 0.5015090954531143 >>> Number of partitions (estimate): 19568 >>> Memtable cell count: 573 >>> Memtable data size: 22971 >>> Memtable off heap memory used: 0 >>> Memtable switch count: 6 >>> Local read count: 529868919 >>> Local read latency: 0.020 ms >>> Local write count: 2707371 >>> Local write latency: 0.024 ms >>> Pending flushes: 0 >>> Percent repaired: 0.0 >>> Bloom filter false positives: 1 >>> Bloom filter false ratio: 0.00000 >>> Bloom filter space used: 23888 >>> Bloom filter off heap memory used: 23880 >>> Index summary off heap memory used: 4717 >>> Compression metadata off heap memory used: 216 >>> Compacted partition minimum bytes: 73 >>> Compacted partition maximum bytes: 124 >>> Compacted partition mean bytes: 99 >>> Average live cells per slice (last five minutes): 1.0 >>> Maximum live cells per slice (last five minutes): 1 >>> Average tombstones per slice (last five minutes): 1.0 >>> Maximum tombstones per slice (last five minutes): 1 >>> Dropped Mutations: 0 >>> histograms >>> Percentile SSTables Write Latency Read Latency Partition >>> Size Cell Count >>> (micros) (micros) >>> (bytes) >>> 50% 0.00 20.50 17.08 >>> 86 1 >>> 75% 0.00 24.60 20.50 >>> 124 1 >>> 95% 0.00 35.43 29.52 >>> 124 1 >>> 98% 0.00 35.43 42.51 >>> 124 1 >>> 99% 0.00 42.51 51.01 >>> 124 1 >>> Min 0.00 8.24 5.72 >>> 73 0 >>> Max 1.00 42.51 152.32 >>> 124 1 >>> ``` >>> >>> 3 node in dc1 and 3 node in dc2 cluster. With instanc type aws ec2 >>> m4.xlarge >>> >>> On Sat, Feb 23, 2019, 7:47 PM Jeff Jirsa <jji...@gmail.com> wrote: >>> >>>> Would also be good to see your schema (anonymized if needed) and the >>>> select queries you’re running >>>> >>>> >>>> -- >>>> Jeff Jirsa >>>> >>>> >>>> On Feb 23, 2019, at 4:37 PM, Rahul Reddy <rahulreddy1...@gmail.com> >>>> wrote: >>>> >>>> Thanks Jeff, >>>> >>>> I'm having gcgs set to 10 mins and changed the table ttl also to 5 >>>> hours compared to insert ttl to 4 hours . Tracing on doesn't show any >>>> tombstone scans for the reads. And also log doesn't show tombstone scan >>>> alerts. Has the reads are happening 5-8k reads per node during the peak >>>> hours it shows 1M tombstone scans count per read. >>>> >>>> On Fri, Feb 22, 2019, 11:46 AM Jeff Jirsa <jji...@gmail.com> wrote: >>>> >>>>> If all of your data is TTL’d and you never explicitly delete a cell >>>>> without using s TTL, you can probably drop your GCGS to 1 hour (or less). >>>>> >>>>> Which compaction strategy are you using? You need a way to clear out >>>>> those tombstones. There exist tombstone compaction sub properties that can >>>>> help encourage compaction to grab sstables just because they’re full of >>>>> tombstones which will probably help you. >>>>> >>>>> >>>>> -- >>>>> Jeff Jirsa >>>>> >>>>> >>>>> On Feb 22, 2019, at 8:37 AM, Kenneth Brotman < >>>>> kenbrot...@yahoo.com.invalid> wrote: >>>>> >>>>> Can we see the histogram? Why wouldn’t you at times have that many >>>>> tombstones? Makes sense. >>>>> >>>>> >>>>> >>>>> Kenneth Brotman >>>>> >>>>> >>>>> >>>>> *From:* Rahul Reddy [mailto:rahulreddy1...@gmail.com >>>>> <rahulreddy1...@gmail.com>] >>>>> *Sent:* Thursday, February 21, 2019 7:06 AM >>>>> *To:* user@cassandra.apache.org >>>>> *Subject:* Tombstones in memtable >>>>> >>>>> >>>>> >>>>> We have small table records are about 5k . >>>>> >>>>> All the inserts comes as 4hr ttl and we have table level ttl 1 day and >>>>> gc grace seconds has 3 hours. We do 5k reads a second during peak load >>>>> During the peak load seeing Alerts for tomstone scanned histogram reaching >>>>> million. >>>>> >>>>> Cassandra version 3.11.1. Please let me know how can this tombstone >>>>> scan can be avoided in memtable >>>>> >>>>>