Hi,

Is there a way to call of MajorDeltaCompactionOp for a table/tablet/rowset?

We've faced with an issue:

0.       Kudu table is created

1.       Data is inserted there

2.       Run select query - it goes fast (matter of a few seconds)

3.       Delete all data from the table (but not dropping the table)

4.       Run select query - it goes slow (4-6 minutes)

Investigating and reading documentation of Kudu has leaded to a thought that 
delete operations are done logically, but physically the table contains written 
data and deletes are applied each time on top of it.
I had a look at kudu tablet and there are quite large "redo" blocks (see one of 
rowset examples below).
There was a thought that compression and encoding play their role (reducing the 
chances to run compaction), but removing them (keeping column defaults) hasn't 
helped as well.
We run tservers

-          maintenance_manager_num_threads=10 (increased comparing to default)

-          tablet_delta_store_major_compact_min_ratio=0.10000000149011612 
(default value)

-          kudu 1.7.0-cdh5.15.0

>From documentation and comments in code I saw the description of 
>tablet_delta_store_major_compact_min_ratio: "Minimum ratio of sizeof(deltas) 
>to sizeof(base data) before a major compaction."
And "Major compactions: the score will be the result of 
sizeof(deltas)/sizeof(base data), unless it is smaller than 
tablet_delta_store_major_compact_min_ratio or if the delta files are only 
composed of deletes, in which case the score is brought down to zero."
So basically the table stays in such state for more than a day.

While majority of tables will have mostly scans, there will be a couple of 
large tables with large number of deletions (but not of all data).
Could you advise how to improve scans after large deletions?

block-id | block-kind  | column| cfile-size | cfile-data-type |                 
                                     cfile-delta-stats                          
                            | cfile-encoding  | cfile-compression
----------+-------------+-------+----------- 
+-----------------+-----------------------------------------------------------------------------------------------------------------------------+-----------------+-------------------
24693586 | column      | var1  | 2.80M      | int64           |                 
                                                                                
                            | BIT_SHUFFLE     | NO_COMPRESSION
24693587 | column      | var2  | 100.7K     | int64           |                 
                                                                                
                            | BIT_SHUFFLE     | NO_COMPRESSION
24693588 | column      | var3  | 4.95M      | int64           |                 
                                                                                
                            | BIT_SHUFFLE     | NO_COMPRESSION
24693589 | column      | var4  | 1.58M      | string          |                 
                                                                                
                            | DICT_ENCODING   | LZ4
24693590 | column      | var5  | 8.82M      | string          |                 
                                                                                
                            | PLAIN_ENCODING  | LZ4
24693591 | column      | var6  | 2.7K       | string          |                 
                                                                                
                            | DICT_ENCODING   | LZ4
24700691 | redo        |       | 14.04M     | binary          | ts 
range=[6319363930065100800, 6319364908129189926], delete_count=[2190649], 
reinsert_count=[0], update_counts_by_col_id=[] | PLAIN_ENCODING  | LZ4
24693592 | bloom       |       | 5.04M      | binary          |                 
                                                                                
                            | PLAIN_ENCODING  | NO_COMPRESSION
24693593 | adhoc-index |       | 8.94M      | binary          |                 
                                                                                
                            | PREFIX_ENCODING | LZ4

Kind Regards,
Sergejs Andrejevs

Reply via email to