Re: gc_grace config for time serie database

onmstester onmstester Wed, 17 Apr 2019 06:12:25 -0700

I do not use table default ttl (every row has its own TTL) and also no update 
occurs to the rows.


 I suppose that (because of immutable nature of everything in cassandra) 
cassandra would keep only the insertion timestamp + the original ttl and  
computes ttl of a row using these two and current timestamp of the system 
whenever needed (when you select ttl or when the compaction occurs).

So there should be something like this attached to every row: "this row 
inserted at 4/17/2019 12:20 PM  and should be deleted in 2 months", so whatever 
happens to the row replicas, my intention of removing it at 6/17 should not be 
changed!



Would you suggest that my idea of "gc_grace = max_hint = 3 hours" for a time 
serie db is not reasonable?


Sent using https://www.zoho.com/mail/






---- On Wed, 17 Apr 2019 17:13:02 +0430 Stefan Miklosovic 
<stefan.mikloso...@instaclustr.com> wrote ----



TTL value is decreasing every second and it is set to original TTL 

value back after some update occurs on that row (see example below). 

Does not it logically imply that if a node is down for some time and 

updates are occurring on live nodes and handoffs are saved for three 

hours and after three hours it stops to do them, your data on other 

nodes would not be deleted as TTLS are reset upon every update and 

countdown starts again, which is correct, but they would be deleted on 

that node which was down because it didnt receive updates so if you 

query that node, data will not be there but they should. 

 

On the other hand, a node was down, it was TTLed on healthy nodes and 

tombstone was created, then you start the first one which was down and 

as it counts down you hit that node with update. So there is not a 

tombstone on the previously dead node but there are tombstones on 

healthy ones and if you delete tombstones after 3 hours, previously 

dead node will never get that info and it your data might actually end 

up being resurrected as they would be replicated to always healthy 

nodes as part of the repair. 

 

Do you see some flaw in my reasoning? 

 

cassandra@cqlsh> DESCRIBE TABLE test.test; 

 

CREATE TABLE test.test ( 

 id uuid PRIMARY KEY, 

 value text 

) WITH bloom_filter_fp_chance = 0.6 

 AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'} 

 AND comment = '' 

 AND compaction = {'class': 

'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 

'max_threshold': '32', 'min_threshold': '4'} 

 AND compression = {'chunk_length_in_kb': '64', 'class': 

'org.apache.cassandra.io.compress.LZ4Compressor'} 

 AND crc_check_chance = 1.0 

 AND dclocal_read_repair_chance = 0.1 

 AND default_time_to_live = 60 

 AND gc_grace_seconds = 864000 

 AND max_index_interval = 2048 

 AND memtable_flush_period_in_ms = 0 

 AND min_index_interval = 128 

 AND read_repair_chance = 0.0 

 AND speculative_retry = '99PERCENTILE'; 

 

 

cassandra@cqlsh> select ttl(value) from test.test where id = 

4f860bf0-d793-4408-8330-a809c6cf6375; 

 

 ttl(value) 

------------ 

 25 

 

(1 rows) 

cassandra@cqlsh> UPDATE test.test SET value = 'c' WHERE  id = 

4f860bf0-d793-4408-8330-a809c6cf6375; 

cassandra@cqlsh> select ttl(value) from test.test where id = 

4f860bf0-d793-4408-8330-a809c6cf6375; 

 

 ttl(value) 

------------ 

 59 

 

(1 rows) 

cassandra@cqlsh> select * from test.test  ; 

 

 id                                   | value 

--------------------------------------+------- 

 4f860bf0-d793-4408-8330-a809c6cf6375 |     c 

 

 

On Wed, 17 Apr 2019 at 19:18, fald 1970 <mailto:falldi1...@gmail.com> wrote: 

> 

> 

> 

> Hi, 

> 

> According to these Facts: 

> 1. If a node is down for longer than max_hint_window_in_ms (3 hours by 
> default), the coordinator stops writing new hints. 

> 2. The main purpose of gc_grace property is to prevent Zombie data and also 
> it determines for how long the coordinator should keep hinted files 

> 

> When we use Cassandra for Time series data which: 

> A) Every row of data has TTL and there would be no explicit delete so not so 
> much worried about zombies 

> B) At every minute there should be hundredrs of write requets to each node, 
> so if one of the node was down for longer than max_hint_window_in_ms, we 
> should run manual repair on that node, so anyway stored hints on the 
> coordinator won't be necessary. 

> 

> So Finally the question, is this a good idea to set gc_grace equal to 
> max_hint_window_in_ms (/1000 to convert to seconds), 

> for example set them both to 3 hours (why should keep the tombstones for 10 
> days when they won't be needed at all)? 

> 

> Best Regards 

> Federica Albertini 

 

--------------------------------------------------------------------- 

To unsubscribe, e-mail: mailto:user-unsubscr...@cassandra.apache.org 

For additional commands, e-mail: mailto:user-h...@cassandra.apache.org

Re: gc_grace config for time serie database

Reply via email to