Re: Tombstones in memtable

Rahul Reddy Sun, 24 Feb 2019 05:37:34 -0800

Thanks Jeff. I'm trying to figure out why the tombstones scans are
happening if possible eliminate it.


On Sat, Feb 23, 2019, 10:50 PM Jeff Jirsa <jji...@gmail.com> wrote:

> G1GC with an 8g heap may be slower than CMS. Also you don’t typically set
> new gen size on G1.
>
> Again though - what problem are you solving here? If you’re serving reads
> and sitting under 50% cpu, it’s not clear to me what you’re trying to fix.
> Tombstones scanned won’t matter for your table, so if that’s your only
> concern, I’d ignore it.
>
>
>
> --
> Jeff Jirsa
>
>
> On Feb 23, 2019, at 7:26 PM, Rahul Reddy <rahulreddy1...@gmail.com> wrote:
>
> ```jvm setting
>
> -XX:+UseThreadPriorities
> -XX:ThreadPriorityPolicy=42
> -XX:+HeapDumpOnOutOfMemoryError
> -Xss256k
> -XX:StringTableSize=1000003
> -XX:+AlwaysPreTouch
> -XX:-UseBiasedLocking
> -XX:+UseTLAB
> -XX:+ResizeTLAB
> -XX:+UseNUMA
> -XX:+PerfDisableSharedMem
> -Djava.net.preferIPv4Stack=true
> -XX:+UseG1GC
> -XX:G1RSetUpdatingPauseTimePercent=5
> -XX:MaxGCPauseMillis=500
> -XX:+PrintGCDetails
> -XX:+PrintGCDateStamps
> -XX:+PrintHeapAtGC
> -XX:+PrintTenuringDistribution
> -XX:+PrintGCApplicationStoppedTime
> -XX:+PrintPromotionFailure
> -XX:+UseGCLogFileRotation
> -XX:NumberOfGCLogFiles=10
> -XX:GCLogFileSize=10M
>
> Total memory
> free
>              total       used       free     shared    buffers     cached
> Mem:      16434004   16125340     308664         60     172872    5565184
> -/+ buffers/cache:   10387284    6046720
> Swap:            0          0          0
>
> Heap settings in cassandra-env.sh
> MAX_HEAP_SIZE="8192M"
> HEAP_NEWSIZE="800M"
> ```
>
> On Sat, Feb 23, 2019, 10:15 PM Rahul Reddy <rahulreddy1...@gmail.com>
> wrote:
>
>> Thanks Jeff,
>>
>> Since low writes and high reads most of the time data in memtables only.
>> When I noticed intially issue no stables on disk everything in memtable
>> only.
>>
>> On Sat, Feb 23, 2019, 10:01 PM Jeff Jirsa <jji...@gmail.com> wrote:
>>
>>> Also given your short ttl and low write rate, you may want to think
>>> about how you can keep more in memory - this may mean larger memtable and
>>> high flush thresholds (reading from the memtable), or perhaps the partition
>>> cache (if you are likely to read the same key multiple times). You’ll also
>>> probably win some with basic perf and GC tuning, but can’t really do that
>>> via email. Cassandra-8150 has some pointers.
>>>
>>> --
>>> Jeff Jirsa
>>>
>>>
>>> On Feb 23, 2019, at 6:52 PM, Jeff Jirsa <jji...@gmail.com> wrote:
>>>
>>> You’ll only ever have one tombstone per read, so your load is based on
>>> normal read rate not tombstones. The metric isn’t wrong, but it’s not
>>> indicative of a problem here given your data model.
>>>
>>> You’re using STCS do you may be reading from more than one sstable if
>>> you update column2 for a given column1, otherwise you’re probably just
>>> seeing normal read load. Consider dropping your compression chunk size a
>>> bit (given the sizes in your cfstats I’d probably go to 4K instead of 64k),
>>> and maybe consider LCS or TWCS instead of STCS (Which is appropriate
>>> depends on a lot of factors, but STCS is probably causing a fair bit of
>>> unnecessary compactions and probably is very slow to expire data).
>>>
>>> --
>>> Jeff Jirsa
>>>
>>>
>>> On Feb 23, 2019, at 6:31 PM, Rahul Reddy <rahulreddy1...@gmail.com>
>>> wrote:
>>>
>>> Do you see anything wrong with this metric.
>>>
>>> metric to scan tombstones
>>>
>>> increase(cassandra_Table_TombstoneScannedHistogram{keyspace="mykeyspace",Table="tablename",function="Count"}[5m])
>>>
>>> And sametime CPU Spike to 50% whenever I see high tombstone alert.
>>>
>>> On Sat, Feb 23, 2019, 9:25 PM Jeff Jirsa <jji...@gmail.com> wrote:
>>>
>>>> Your schema is such that you’ll never read more than one tombstone per
>>>> select (unless you’re also doing range reads / table scans that you didn’t
>>>> mention) - I’m not quite sure what you’re alerting on, but you’re not going
>>>> to have tombstone problems with that table / that select.
>>>>
>>>> --
>>>> Jeff Jirsa
>>>>
>>>>
>>>> On Feb 23, 2019, at 5:55 PM, Rahul Reddy <rahulreddy1...@gmail.com>
>>>> wrote:
>>>>
>>>> Changing gcgs didn't help
>>>>
>>>> CREATE KEYSPACE ksname WITH replication = {'class':
>>>> 'NetworkTopologyStrategy', 'dc1': '3', 'dc2': '3'}  AND durable_writes =
>>>> true;
>>>>
>>>>
>>>> ```CREATE TABLE keyspace."table" (
>>>>     "column1" text PRIMARY KEY,
>>>>     "column2" text
>>>> ) WITH bloom_filter_fp_chance = 0.01
>>>>     AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
>>>>     AND comment = ''
>>>>     AND compaction = {'class':
>>>> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
>>>> 'max_threshold': '32', 'min_threshold': '4'}
>>>>     AND compression = {'chunk_length_in_kb': '64', 'class':
>>>> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>>>>     AND crc_check_chance = 1.0
>>>>     AND dclocal_read_repair_chance = 0.1
>>>>     AND default_time_to_live = 18000
>>>>     AND gc_grace_seconds = 60
>>>>     AND max_index_interval = 2048
>>>>     AND memtable_flush_period_in_ms = 0
>>>>     AND min_index_interval = 128
>>>>     AND read_repair_chance = 0.0
>>>>     AND speculative_retry = '99PERCENTILE';
>>>>
>>>> flushed table and took tsstabledump
>>>> grep -i '"expired" : true' SSTables.txt|wc -l
>>>> 16439
>>>> grep -i '"expired" : false'  SSTables.txt |wc -l
>>>> 2657
>>>>
>>>> ttl is 4 hours.
>>>>
>>>> INSERT INTO keyspace."TABLE_NAME" ("column1", "column2") VALUES (?, ?)
>>>> USING TTL(4hours) ?';
>>>> SELECT * FROM keyspace."TABLE_NAME" WHERE "column1" = ?';
>>>>
>>>> metric to scan tombstones
>>>>
>>>> increase(cassandra_Table_TombstoneScannedHistogram{keyspace="mykeyspace",Table="tablename",function="Count"}[5m])
>>>>
>>>> during peak hours. we only have couple of hundred inserts and 5-8k
>>>> reads/s per node.
>>>> ```
>>>>
>>>> ```tablestats
>>>> Read Count: 605231874
>>>> Read Latency: 0.021268529760215503 ms.
>>>> Write Count: 2763352
>>>> Write Latency: 0.027924007871599422 ms.
>>>> Pending Flushes: 0
>>>> Table: name
>>>> SSTable count: 1
>>>> Space used (live): 1413203
>>>> Space used (total): 1413203
>>>> Space used by snapshots (total): 0
>>>> Off heap memory used (total): 28813
>>>> SSTable Compression Ratio: 0.5015090954531143
>>>> Number of partitions (estimate): 19568
>>>> Memtable cell count: 573
>>>> Memtable data size: 22971
>>>> Memtable off heap memory used: 0
>>>> Memtable switch count: 6
>>>> Local read count: 529868919
>>>> Local read latency: 0.020 ms
>>>> Local write count: 2707371
>>>> Local write latency: 0.024 ms
>>>> Pending flushes: 0
>>>> Percent repaired: 0.0
>>>> Bloom filter false positives: 1
>>>> Bloom filter false ratio: 0.00000
>>>> Bloom filter space used: 23888
>>>> Bloom filter off heap memory used: 23880
>>>> Index summary off heap memory used: 4717
>>>> Compression metadata off heap memory used: 216
>>>> Compacted partition minimum bytes: 73
>>>> Compacted partition maximum bytes: 124
>>>> Compacted partition mean bytes: 99
>>>> Average live cells per slice (last five minutes): 1.0
>>>> Maximum live cells per slice (last five minutes): 1
>>>> Average tombstones per slice (last five minutes): 1.0
>>>> Maximum tombstones per slice (last five minutes): 1
>>>> Dropped Mutations: 0
>>>> histograms
>>>> Percentile  SSTables     Write Latency      Read Latency    Partition
>>>> Size        Cell Count
>>>>                               (micros)          (micros)
>>>>  (bytes)
>>>> 50%             0.00             20.50             17.08
>>>> 86                 1
>>>> 75%             0.00             24.60             20.50
>>>>  124                 1
>>>> 95%             0.00             35.43             29.52
>>>>  124                 1
>>>> 98%             0.00             35.43             42.51
>>>>  124                 1
>>>> 99%             0.00             42.51             51.01
>>>>  124                 1
>>>> Min             0.00              8.24              5.72
>>>> 73                 0
>>>> Max             1.00             42.51            152.32
>>>>  124                 1
>>>> ```
>>>>
>>>> 3 node in dc1 and 3 node in dc2 cluster. With instanc type aws  ec2
>>>> m4.xlarge
>>>>
>>>> On Sat, Feb 23, 2019, 7:47 PM Jeff Jirsa <jji...@gmail.com> wrote:
>>>>
>>>>> Would also be good to see your schema (anonymized if needed) and the
>>>>> select queries you’re running
>>>>>
>>>>>
>>>>> --
>>>>> Jeff Jirsa
>>>>>
>>>>>
>>>>> On Feb 23, 2019, at 4:37 PM, Rahul Reddy <rahulreddy1...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> Thanks Jeff,
>>>>>
>>>>> I'm having gcgs set to 10 mins and changed the table ttl also to 5
>>>>> hours compared to insert ttl to 4 hours .  Tracing on doesn't show any
>>>>> tombstone scans for the reads.  And also log doesn't show tombstone scan
>>>>> alerts. Has the reads are happening 5-8k reads per node during the peak
>>>>> hours it shows 1M tombstone scans count per read.
>>>>>
>>>>> On Fri, Feb 22, 2019, 11:46 AM Jeff Jirsa <jji...@gmail.com> wrote:
>>>>>
>>>>>> If all of your data is TTL’d and you never explicitly delete a cell
>>>>>> without using s TTL, you can probably drop your GCGS to 1 hour (or less).
>>>>>>
>>>>>> Which compaction strategy are you using? You need a way to clear out
>>>>>> those tombstones. There exist tombstone compaction sub properties that 
>>>>>> can
>>>>>> help encourage compaction to grab sstables just because they’re full of
>>>>>> tombstones which will probably help you.
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Jeff Jirsa
>>>>>>
>>>>>>
>>>>>> On Feb 22, 2019, at 8:37 AM, Kenneth Brotman <
>>>>>> kenbrot...@yahoo.com.invalid> wrote:
>>>>>>
>>>>>> Can we see the histogram?  Why wouldn’t you at times have that many
>>>>>> tombstones?  Makes sense.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Kenneth Brotman
>>>>>>
>>>>>>
>>>>>>
>>>>>> *From:* Rahul Reddy [mailto:rahulreddy1...@gmail.com
>>>>>> <rahulreddy1...@gmail.com>]
>>>>>> *Sent:* Thursday, February 21, 2019 7:06 AM
>>>>>> *To:* user@cassandra.apache.org
>>>>>> *Subject:* Tombstones in memtable
>>>>>>
>>>>>>
>>>>>>
>>>>>> We have small table records are about 5k .
>>>>>>
>>>>>> All the inserts comes as 4hr ttl and we have table level ttl 1 day
>>>>>> and gc grace seconds has 3 hours.  We do 5k reads a second during peak 
>>>>>> load
>>>>>> During the peak load seeing Alerts for tomstone scanned histogram 
>>>>>> reaching
>>>>>> million.
>>>>>>
>>>>>> Cassandra version 3.11.1. Please let me know how can this tombstone
>>>>>> scan can be avoided in memtable
>>>>>>
>>>>>>

Re: Tombstones in memtable

Reply via email to