G1GC with an 8g heap may be slower than CMS. Also you don’t typically set new gen size on G1.
Again though - what problem are you solving here? If you’re serving reads and sitting under 50% cpu, it’s not clear to me what you’re trying to fix. Tombstones scanned won’t matter for your table, so if that’s your only concern, I’d ignore it. -- Jeff Jirsa > On Feb 23, 2019, at 7:26 PM, Rahul Reddy <rahulreddy1...@gmail.com> wrote: > > ```jvm setting > > -XX:+UseThreadPriorities > -XX:ThreadPriorityPolicy=42 > -XX:+HeapDumpOnOutOfMemoryError > -Xss256k > -XX:StringTableSize=1000003 > -XX:+AlwaysPreTouch > -XX:-UseBiasedLocking > -XX:+UseTLAB > -XX:+ResizeTLAB > -XX:+UseNUMA > -XX:+PerfDisableSharedMem > -Djava.net.preferIPv4Stack=true > -XX:+UseG1GC > -XX:G1RSetUpdatingPauseTimePercent=5 > -XX:MaxGCPauseMillis=500 > -XX:+PrintGCDetails > -XX:+PrintGCDateStamps > -XX:+PrintHeapAtGC > -XX:+PrintTenuringDistribution > -XX:+PrintGCApplicationStoppedTime > -XX:+PrintPromotionFailure > -XX:+UseGCLogFileRotation > -XX:NumberOfGCLogFiles=10 > -XX:GCLogFileSize=10M > > Total memory > free > total used free shared buffers cached > Mem: 16434004 16125340 308664 60 172872 5565184 > -/+ buffers/cache: 10387284 6046720 > Swap: 0 0 0 > > Heap settings in cassandra-env.sh > MAX_HEAP_SIZE="8192M" > HEAP_NEWSIZE="800M" > ``` > >> On Sat, Feb 23, 2019, 10:15 PM Rahul Reddy <rahulreddy1...@gmail.com> wrote: >> Thanks Jeff, >> >> Since low writes and high reads most of the time data in memtables only. >> When I noticed intially issue no stables on disk everything in memtable >> only. >> >>> On Sat, Feb 23, 2019, 10:01 PM Jeff Jirsa <jji...@gmail.com> wrote: >>> Also given your short ttl and low write rate, you may want to think about >>> how you can keep more in memory - this may mean larger memtable and high >>> flush thresholds (reading from the memtable), or perhaps the partition >>> cache (if you are likely to read the same key multiple times). You’ll also >>> probably win some with basic perf and GC tuning, but can’t really do that >>> via email. Cassandra-8150 has some pointers. >>> >>> -- >>> Jeff Jirsa >>> >>> >>>> On Feb 23, 2019, at 6:52 PM, Jeff Jirsa <jji...@gmail.com> wrote: >>>> >>>> You’ll only ever have one tombstone per read, so your load is based on >>>> normal read rate not tombstones. The metric isn’t wrong, but it’s not >>>> indicative of a problem here given your data model. >>>> >>>> You’re using STCS do you may be reading from more than one sstable if you >>>> update column2 for a given column1, otherwise you’re probably just seeing >>>> normal read load. Consider dropping your compression chunk size a bit >>>> (given the sizes in your cfstats I’d probably go to 4K instead of 64k), >>>> and maybe consider LCS or TWCS instead of STCS (Which is appropriate >>>> depends on a lot of factors, but STCS is probably causing a fair bit of >>>> unnecessary compactions and probably is very slow to expire data). >>>> >>>> -- >>>> Jeff Jirsa >>>> >>>> >>>>> On Feb 23, 2019, at 6:31 PM, Rahul Reddy <rahulreddy1...@gmail.com> wrote: >>>>> >>>>> Do you see anything wrong with this metric. >>>>> >>>>> metric to scan tombstones >>>>> increase(cassandra_Table_TombstoneScannedHistogram{keyspace="mykeyspace",Table="tablename",function="Count"}[5m]) >>>>> >>>>> And sametime CPU Spike to 50% whenever I see high tombstone alert. >>>>> >>>>>> On Sat, Feb 23, 2019, 9:25 PM Jeff Jirsa <jji...@gmail.com> wrote: >>>>>> Your schema is such that you’ll never read more than one tombstone per >>>>>> select (unless you’re also doing range reads / table scans that you >>>>>> didn’t mention) - I’m not quite sure what you’re alerting on, but you’re >>>>>> not going to have tombstone problems with that table / that select. >>>>>> >>>>>> -- >>>>>> Jeff Jirsa >>>>>> >>>>>> >>>>>>> On Feb 23, 2019, at 5:55 PM, Rahul Reddy <rahulreddy1...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>> Changing gcgs didn't help >>>>>>> >>>>>>> CREATE KEYSPACE ksname WITH replication = {'class': >>>>>>> 'NetworkTopologyStrategy', 'dc1': '3', 'dc2': '3'} AND durable_writes >>>>>>> = true; >>>>>>> >>>>>>> >>>>>>> ```CREATE TABLE keyspace."table" ( >>>>>>> "column1" text PRIMARY KEY, >>>>>>> "column2" text >>>>>>> ) WITH bloom_filter_fp_chance = 0.01 >>>>>>> AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'} >>>>>>> AND comment = '' >>>>>>> AND compaction = {'class': >>>>>>> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', >>>>>>> 'max_threshold': '32', 'min_threshold': '4'} >>>>>>> AND compression = {'chunk_length_in_kb': '64', 'class': >>>>>>> 'org.apache.cassandra.io.compress.LZ4Compressor'} >>>>>>> AND crc_check_chance = 1.0 >>>>>>> AND dclocal_read_repair_chance = 0.1 >>>>>>> AND default_time_to_live = 18000 >>>>>>> AND gc_grace_seconds = 60 >>>>>>> AND max_index_interval = 2048 >>>>>>> AND memtable_flush_period_in_ms = 0 >>>>>>> AND min_index_interval = 128 >>>>>>> AND read_repair_chance = 0.0 >>>>>>> AND speculative_retry = '99PERCENTILE'; >>>>>>> >>>>>>> flushed table and took tsstabledump >>>>>>> grep -i '"expired" : true' SSTables.txt|wc -l >>>>>>> 16439 >>>>>>> grep -i '"expired" : false' SSTables.txt |wc -l >>>>>>> 2657 >>>>>>> >>>>>>> ttl is 4 hours. >>>>>>> >>>>>>> INSERT INTO keyspace."TABLE_NAME" ("column1", "column2") VALUES (?, ?) >>>>>>> USING TTL(4hours) ?'; >>>>>>> SELECT * FROM keyspace."TABLE_NAME" WHERE "column1" = ?'; >>>>>>> >>>>>>> metric to scan tombstones >>>>>>> increase(cassandra_Table_TombstoneScannedHistogram{keyspace="mykeyspace",Table="tablename",function="Count"}[5m]) >>>>>>> >>>>>>> during peak hours. we only have couple of hundred inserts and 5-8k >>>>>>> reads/s per node. >>>>>>> ``` >>>>>>> >>>>>>> ```tablestats >>>>>>> Read Count: 605231874 >>>>>>> Read Latency: 0.021268529760215503 ms. >>>>>>> Write Count: 2763352 >>>>>>> Write Latency: 0.027924007871599422 ms. >>>>>>> Pending Flushes: 0 >>>>>>> Table: name >>>>>>> SSTable count: 1 >>>>>>> Space used (live): 1413203 >>>>>>> Space used (total): 1413203 >>>>>>> Space used by snapshots (total): 0 >>>>>>> Off heap memory used (total): 28813 >>>>>>> SSTable Compression Ratio: 0.5015090954531143 >>>>>>> Number of partitions (estimate): 19568 >>>>>>> Memtable cell count: 573 >>>>>>> Memtable data size: 22971 >>>>>>> Memtable off heap memory used: 0 >>>>>>> Memtable switch count: 6 >>>>>>> Local read count: 529868919 >>>>>>> Local read latency: 0.020 ms >>>>>>> Local write count: 2707371 >>>>>>> Local write latency: 0.024 ms >>>>>>> Pending flushes: 0 >>>>>>> Percent repaired: 0.0 >>>>>>> Bloom filter false positives: 1 >>>>>>> Bloom filter false ratio: 0.00000 >>>>>>> Bloom filter space used: 23888 >>>>>>> Bloom filter off heap memory used: 23880 >>>>>>> Index summary off heap memory used: 4717 >>>>>>> Compression metadata off heap memory used: 216 >>>>>>> Compacted partition minimum bytes: 73 >>>>>>> Compacted partition maximum bytes: 124 >>>>>>> Compacted partition mean bytes: 99 >>>>>>> Average live cells per slice (last five minutes): 1.0 >>>>>>> Maximum live cells per slice (last five minutes): 1 >>>>>>> Average tombstones per slice (last five minutes): 1.0 >>>>>>> Maximum tombstones per slice (last five minutes): 1 >>>>>>> Dropped Mutations: 0 >>>>>>> >>>>>>> histograms >>>>>>> Percentile SSTables Write Latency Read Latency Partition >>>>>>> Size Cell Count >>>>>>> (micros) (micros) >>>>>>> (bytes) >>>>>>> 50% 0.00 20.50 17.08 >>>>>>> 86 1 >>>>>>> 75% 0.00 24.60 20.50 >>>>>>> 124 1 >>>>>>> 95% 0.00 35.43 29.52 >>>>>>> 124 1 >>>>>>> 98% 0.00 35.43 42.51 >>>>>>> 124 1 >>>>>>> 99% 0.00 42.51 51.01 >>>>>>> 124 1 >>>>>>> Min 0.00 8.24 5.72 >>>>>>> 73 0 >>>>>>> Max 1.00 42.51 152.32 >>>>>>> 124 1 >>>>>>> ``` >>>>>>> >>>>>>> 3 node in dc1 and 3 node in dc2 cluster. With instanc type aws ec2 >>>>>>> m4.xlarge >>>>>>> >>>>>>>> On Sat, Feb 23, 2019, 7:47 PM Jeff Jirsa <jji...@gmail.com> wrote: >>>>>>>> Would also be good to see your schema (anonymized if needed) and the >>>>>>>> select queries you’re running >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Jeff Jirsa >>>>>>>> >>>>>>>> >>>>>>>>> On Feb 23, 2019, at 4:37 PM, Rahul Reddy <rahulreddy1...@gmail.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> Thanks Jeff, >>>>>>>>> >>>>>>>>> I'm having gcgs set to 10 mins and changed the table ttl also to 5 >>>>>>>>> hours compared to insert ttl to 4 hours . Tracing on doesn't show >>>>>>>>> any tombstone scans for the reads. And also log doesn't show >>>>>>>>> tombstone scan alerts. Has the reads are happening 5-8k reads per >>>>>>>>> node during the peak hours it shows 1M tombstone scans count per >>>>>>>>> read. >>>>>>>>> >>>>>>>>>> On Fri, Feb 22, 2019, 11:46 AM Jeff Jirsa <jji...@gmail.com> wrote: >>>>>>>>>> If all of your data is TTL’d and you never explicitly delete a cell >>>>>>>>>> without using s TTL, you can probably drop your GCGS to 1 hour (or >>>>>>>>>> less). >>>>>>>>>> >>>>>>>>>> Which compaction strategy are you using? You need a way to clear out >>>>>>>>>> those tombstones. There exist tombstone compaction sub properties >>>>>>>>>> that can help encourage compaction to grab sstables just because >>>>>>>>>> they’re full of tombstones which will probably help you. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Jeff Jirsa >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> On Feb 22, 2019, at 8:37 AM, Kenneth Brotman >>>>>>>>>>> <kenbrot...@yahoo.com.invalid> wrote: >>>>>>>>>>> >>>>>>>>>>> Can we see the histogram? Why wouldn’t you at times have that many >>>>>>>>>>> tombstones? Makes sense. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Kenneth Brotman >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> From: Rahul Reddy [mailto:rahulreddy1...@gmail.com] >>>>>>>>>>> Sent: Thursday, February 21, 2019 7:06 AM >>>>>>>>>>> To: user@cassandra.apache.org >>>>>>>>>>> Subject: Tombstones in memtable >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> We have small table records are about 5k . >>>>>>>>>>> >>>>>>>>>>> All the inserts comes as 4hr ttl and we have table level ttl 1 day >>>>>>>>>>> and gc grace seconds has 3 hours. We do 5k reads a second during >>>>>>>>>>> peak load During the peak load seeing Alerts for tomstone scanned >>>>>>>>>>> histogram reaching million. >>>>>>>>>>> >>>>>>>>>>> Cassandra version 3.11.1. Please let me know how can this tombstone >>>>>>>>>>> scan can be avoided in memtable