Re: Tombstones in memtable

Jeff Jirsa Sat, 23 Feb 2019 19:50:17 -0800

G1GC with an 8g heap may be slower than CMS. Also you don’t typically set new 
gen size on G1.


Again though - what problem are you solving here? If you’re serving reads and 
sitting under 50% cpu, it’s not clear to me what you’re trying to fix. 
Tombstones scanned won’t matter for your table, so if that’s your only concern, 
I’d ignore it. 



-- 
Jeff Jirsa


> On Feb 23, 2019, at 7:26 PM, Rahul Reddy <rahulreddy1...@gmail.com> wrote:
> 
> ```jvm setting
> 
> -XX:+UseThreadPriorities
> -XX:ThreadPriorityPolicy=42
> -XX:+HeapDumpOnOutOfMemoryError
> -Xss256k
> -XX:StringTableSize=1000003
> -XX:+AlwaysPreTouch
> -XX:-UseBiasedLocking
> -XX:+UseTLAB
> -XX:+ResizeTLAB
> -XX:+UseNUMA
> -XX:+PerfDisableSharedMem
> -Djava.net.preferIPv4Stack=true
> -XX:+UseG1GC
> -XX:G1RSetUpdatingPauseTimePercent=5
> -XX:MaxGCPauseMillis=500
> -XX:+PrintGCDetails
> -XX:+PrintGCDateStamps
> -XX:+PrintHeapAtGC
> -XX:+PrintTenuringDistribution
> -XX:+PrintGCApplicationStoppedTime
> -XX:+PrintPromotionFailure
> -XX:+UseGCLogFileRotation
> -XX:NumberOfGCLogFiles=10
> -XX:GCLogFileSize=10M
> 
> Total memory
> free
>              total       used       free     shared    buffers     cached
> Mem:      16434004   16125340     308664         60     172872    5565184
> -/+ buffers/cache:   10387284    6046720
> Swap:            0          0          0
> 
> Heap settings in cassandra-env.sh
> MAX_HEAP_SIZE="8192M"
> HEAP_NEWSIZE="800M"
> ```
> 
>> On Sat, Feb 23, 2019, 10:15 PM Rahul Reddy <rahulreddy1...@gmail.com> wrote:
>> Thanks Jeff,
>> 
>> Since low writes and high reads most of the time data in memtables only.  
>> When I noticed intially issue no stables on disk everything in memtable 
>> only. 
>> 
>>> On Sat, Feb 23, 2019, 10:01 PM Jeff Jirsa <jji...@gmail.com> wrote:
>>> Also given your short ttl and low write rate, you may want to think about 
>>> how you can keep more in memory - this may mean larger memtable and high 
>>> flush thresholds (reading from the memtable), or perhaps the partition 
>>> cache (if you are likely to read the same key multiple times). You’ll also 
>>> probably win some with basic perf and GC tuning, but can’t really do that 
>>> via email. Cassandra-8150 has some pointers. 
>>> 
>>> -- 
>>> Jeff Jirsa
>>> 
>>> 
>>>> On Feb 23, 2019, at 6:52 PM, Jeff Jirsa <jji...@gmail.com> wrote:
>>>> 
>>>> You’ll only ever have one tombstone per read, so your load is based on 
>>>> normal read rate not tombstones. The metric isn’t wrong, but it’s not 
>>>> indicative of a problem here given your data model. 
>>>> 
>>>> You’re using STCS do you may be reading from more than one sstable if you 
>>>> update column2 for a given column1, otherwise you’re probably just seeing 
>>>> normal read load. Consider dropping your compression chunk size a bit 
>>>> (given the sizes in your cfstats I’d probably go to 4K instead of 64k), 
>>>> and maybe consider LCS or TWCS instead of STCS (Which is appropriate 
>>>> depends on a lot of factors, but STCS is probably causing a fair bit of 
>>>> unnecessary compactions and probably is very slow to expire data).
>>>> 
>>>> -- 
>>>> Jeff Jirsa
>>>> 
>>>> 
>>>>> On Feb 23, 2019, at 6:31 PM, Rahul Reddy <rahulreddy1...@gmail.com> wrote:
>>>>> 
>>>>> Do you see anything wrong with this metric.
>>>>> 
>>>>> metric to scan tombstones
>>>>> increase(cassandra_Table_TombstoneScannedHistogram{keyspace="mykeyspace",Table="tablename",function="Count"}[5m])
>>>>> 
>>>>> And sametime CPU Spike to 50% whenever I see high tombstone alert.
>>>>> 
>>>>>> On Sat, Feb 23, 2019, 9:25 PM Jeff Jirsa <jji...@gmail.com> wrote:
>>>>>> Your schema is such that you’ll never read more than one tombstone per 
>>>>>> select (unless you’re also doing range reads / table scans that you 
>>>>>> didn’t mention) - I’m not quite sure what you’re alerting on, but you’re 
>>>>>> not going to have tombstone problems with that table / that select. 
>>>>>> 
>>>>>> -- 
>>>>>> Jeff Jirsa
>>>>>> 
>>>>>> 
>>>>>>> On Feb 23, 2019, at 5:55 PM, Rahul Reddy <rahulreddy1...@gmail.com> 
>>>>>>> wrote:
>>>>>>> 
>>>>>>> Changing gcgs didn't help
>>>>>>> 
>>>>>>> CREATE KEYSPACE ksname WITH replication = {'class': 
>>>>>>> 'NetworkTopologyStrategy', 'dc1': '3', 'dc2': '3'}  AND durable_writes 
>>>>>>> = true;
>>>>>>> 
>>>>>>> 
>>>>>>> ```CREATE TABLE keyspace."table" (
>>>>>>>     "column1" text PRIMARY KEY,
>>>>>>>     "column2" text
>>>>>>> ) WITH bloom_filter_fp_chance = 0.01
>>>>>>>     AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
>>>>>>>     AND comment = ''
>>>>>>>     AND compaction = {'class': 
>>>>>>> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
>>>>>>> 'max_threshold': '32', 'min_threshold': '4'}
>>>>>>>     AND compression = {'chunk_length_in_kb': '64', 'class': 
>>>>>>> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>>>>>>>     AND crc_check_chance = 1.0
>>>>>>>     AND dclocal_read_repair_chance = 0.1
>>>>>>>     AND default_time_to_live = 18000
>>>>>>>     AND gc_grace_seconds = 60
>>>>>>>     AND max_index_interval = 2048
>>>>>>>     AND memtable_flush_period_in_ms = 0
>>>>>>>     AND min_index_interval = 128
>>>>>>>     AND read_repair_chance = 0.0
>>>>>>>     AND speculative_retry = '99PERCENTILE';
>>>>>>> 
>>>>>>> flushed table and took tsstabledump    
>>>>>>> grep -i '"expired" : true' SSTables.txt|wc -l
>>>>>>> 16439
>>>>>>> grep -i '"expired" : false'  SSTables.txt |wc -l
>>>>>>> 2657
>>>>>>> 
>>>>>>> ttl is 4 hours.
>>>>>>> 
>>>>>>> INSERT INTO keyspace."TABLE_NAME" ("column1", "column2") VALUES (?, ?) 
>>>>>>> USING TTL(4hours) ?';
>>>>>>> SELECT * FROM keyspace."TABLE_NAME" WHERE "column1" = ?';
>>>>>>> 
>>>>>>> metric to scan tombstones 
>>>>>>> increase(cassandra_Table_TombstoneScannedHistogram{keyspace="mykeyspace",Table="tablename",function="Count"}[5m])
>>>>>>> 
>>>>>>> during peak hours. we only have couple of hundred inserts and 5-8k 
>>>>>>> reads/s per node.
>>>>>>> ```
>>>>>>> 
>>>>>>> ```tablestats
>>>>>>>         Read Count: 605231874
>>>>>>>         Read Latency: 0.021268529760215503 ms.
>>>>>>>         Write Count: 2763352
>>>>>>>         Write Latency: 0.027924007871599422 ms.
>>>>>>>         Pending Flushes: 0
>>>>>>>                 Table: name
>>>>>>>                 SSTable count: 1
>>>>>>>                 Space used (live): 1413203
>>>>>>>                 Space used (total): 1413203
>>>>>>>                 Space used by snapshots (total): 0
>>>>>>>                 Off heap memory used (total): 28813
>>>>>>>                 SSTable Compression Ratio: 0.5015090954531143
>>>>>>>                 Number of partitions (estimate): 19568
>>>>>>>                 Memtable cell count: 573
>>>>>>>                 Memtable data size: 22971
>>>>>>>                 Memtable off heap memory used: 0
>>>>>>>                 Memtable switch count: 6
>>>>>>>                 Local read count: 529868919
>>>>>>>                 Local read latency: 0.020 ms
>>>>>>>                 Local write count: 2707371
>>>>>>>                 Local write latency: 0.024 ms
>>>>>>>                 Pending flushes: 0
>>>>>>>                 Percent repaired: 0.0
>>>>>>>                 Bloom filter false positives: 1
>>>>>>>                 Bloom filter false ratio: 0.00000
>>>>>>>                 Bloom filter space used: 23888
>>>>>>>                 Bloom filter off heap memory used: 23880
>>>>>>>                 Index summary off heap memory used: 4717
>>>>>>>                 Compression metadata off heap memory used: 216
>>>>>>>                 Compacted partition minimum bytes: 73
>>>>>>>                 Compacted partition maximum bytes: 124
>>>>>>>                 Compacted partition mean bytes: 99
>>>>>>>                 Average live cells per slice (last five minutes): 1.0
>>>>>>>                 Maximum live cells per slice (last five minutes): 1
>>>>>>>                 Average tombstones per slice (last five minutes): 1.0
>>>>>>>                 Maximum tombstones per slice (last five minutes): 1
>>>>>>>                 Dropped Mutations: 0
>>>>>>>                 
>>>>>>>                 histograms
>>>>>>> Percentile  SSTables     Write Latency      Read Latency    Partition 
>>>>>>> Size        Cell Count
>>>>>>>                               (micros)          (micros)           
>>>>>>> (bytes)                  
>>>>>>> 50%             0.00             20.50             17.08                
>>>>>>> 86                 1
>>>>>>> 75%             0.00             24.60             20.50               
>>>>>>> 124                 1
>>>>>>> 95%             0.00             35.43             29.52               
>>>>>>> 124                 1
>>>>>>> 98%             0.00             35.43             42.51               
>>>>>>> 124                 1
>>>>>>> 99%             0.00             42.51             51.01               
>>>>>>> 124                 1
>>>>>>> Min             0.00              8.24              5.72                
>>>>>>> 73                 0
>>>>>>> Max             1.00             42.51            152.32               
>>>>>>> 124                 1
>>>>>>> ```
>>>>>>> 
>>>>>>> 3 node in dc1 and 3 node in dc2 cluster. With instanc type aws  ec2 
>>>>>>> m4.xlarge
>>>>>>> 
>>>>>>>> On Sat, Feb 23, 2019, 7:47 PM Jeff Jirsa <jji...@gmail.com> wrote:
>>>>>>>> Would also be good to see your schema (anonymized if needed) and the 
>>>>>>>> select queries you’re running
>>>>>>>> 
>>>>>>>> 
>>>>>>>> -- 
>>>>>>>> Jeff Jirsa
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On Feb 23, 2019, at 4:37 PM, Rahul Reddy <rahulreddy1...@gmail.com> 
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> Thanks Jeff,
>>>>>>>>> 
>>>>>>>>> I'm having gcgs set to 10 mins and changed the table ttl also to 5  
>>>>>>>>> hours compared to insert ttl to 4 hours .  Tracing on doesn't show 
>>>>>>>>> any tombstone scans for the reads.  And also log doesn't show 
>>>>>>>>> tombstone scan alerts. Has the reads are happening 5-8k reads per 
>>>>>>>>> node during the peak hours it shows 1M tombstone scans count per 
>>>>>>>>> read. 
>>>>>>>>> 
>>>>>>>>>> On Fri, Feb 22, 2019, 11:46 AM Jeff Jirsa <jji...@gmail.com> wrote:
>>>>>>>>>> If all of your data is TTL’d and you never explicitly delete a cell 
>>>>>>>>>> without using s TTL, you can probably drop your GCGS to 1 hour (or 
>>>>>>>>>> less).
>>>>>>>>>> 
>>>>>>>>>> Which compaction strategy are you using? You need a way to clear out 
>>>>>>>>>> those tombstones. There exist tombstone compaction sub properties 
>>>>>>>>>> that can help encourage compaction to grab sstables just because 
>>>>>>>>>> they’re full of tombstones which will probably help you.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> -- 
>>>>>>>>>> Jeff Jirsa
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> On Feb 22, 2019, at 8:37 AM, Kenneth Brotman 
>>>>>>>>>>> <kenbrot...@yahoo.com.invalid> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Can we see the histogram?  Why wouldn’t you at times have that many 
>>>>>>>>>>> tombstones?  Makes sense.
>>>>>>>>>>> 
>>>>>>>>>>>  
>>>>>>>>>>> 
>>>>>>>>>>> Kenneth Brotman
>>>>>>>>>>> 
>>>>>>>>>>>  
>>>>>>>>>>> 
>>>>>>>>>>> From: Rahul Reddy [mailto:rahulreddy1...@gmail.com] 
>>>>>>>>>>> Sent: Thursday, February 21, 2019 7:06 AM
>>>>>>>>>>> To: user@cassandra.apache.org
>>>>>>>>>>> Subject: Tombstones in memtable
>>>>>>>>>>> 
>>>>>>>>>>>  
>>>>>>>>>>> 
>>>>>>>>>>> We have small table records are about 5k .
>>>>>>>>>>> 
>>>>>>>>>>> All the inserts comes as 4hr ttl and we have table level ttl 1 day 
>>>>>>>>>>> and gc grace seconds has 3 hours.  We do 5k reads a second during 
>>>>>>>>>>> peak load During the peak load seeing Alerts for tomstone scanned 
>>>>>>>>>>> histogram reaching million.
>>>>>>>>>>> 
>>>>>>>>>>> Cassandra version 3.11.1. Please let me know how can this tombstone 
>>>>>>>>>>> scan can be avoided in memtable

Re: Tombstones in memtable

Reply via email to