Hi, Is there a way to call of MajorDeltaCompactionOp for a table/tablet/rowset?
We've faced with an issue: 0. Kudu table is created 1. Data is inserted there 2. Run select query - it goes fast (matter of a few seconds) 3. Delete all data from the table (but not dropping the table) 4. Run select query - it goes slow (4-6 minutes) Investigating and reading documentation of Kudu has leaded to a thought that delete operations are done logically, but physically the table contains written data and deletes are applied each time on top of it. I had a look at kudu tablet and there are quite large "redo" blocks (see one of rowset examples below). There was a thought that compression and encoding play their role (reducing the chances to run compaction), but removing them (keeping column defaults) hasn't helped as well. We run tservers - maintenance_manager_num_threads=10 (increased comparing to default) - tablet_delta_store_major_compact_min_ratio=0.10000000149011612 (default value) - kudu 1.7.0-cdh5.15.0 >From documentation and comments in code I saw the description of >tablet_delta_store_major_compact_min_ratio: "Minimum ratio of sizeof(deltas) >to sizeof(base data) before a major compaction." And "Major compactions: the score will be the result of sizeof(deltas)/sizeof(base data), unless it is smaller than tablet_delta_store_major_compact_min_ratio or if the delta files are only composed of deletes, in which case the score is brought down to zero." So basically the table stays in such state for more than a day. While majority of tables will have mostly scans, there will be a couple of large tables with large number of deletions (but not of all data). Could you advise how to improve scans after large deletions? block-id | block-kind | column| cfile-size | cfile-data-type | cfile-delta-stats | cfile-encoding | cfile-compression ----------+-------------+-------+----------- +-----------------+-----------------------------------------------------------------------------------------------------------------------------+-----------------+------------------- 24693586 | column | var1 | 2.80M | int64 | | BIT_SHUFFLE | NO_COMPRESSION 24693587 | column | var2 | 100.7K | int64 | | BIT_SHUFFLE | NO_COMPRESSION 24693588 | column | var3 | 4.95M | int64 | | BIT_SHUFFLE | NO_COMPRESSION 24693589 | column | var4 | 1.58M | string | | DICT_ENCODING | LZ4 24693590 | column | var5 | 8.82M | string | | PLAIN_ENCODING | LZ4 24693591 | column | var6 | 2.7K | string | | DICT_ENCODING | LZ4 24700691 | redo | | 14.04M | binary | ts range=[6319363930065100800, 6319364908129189926], delete_count=[2190649], reinsert_count=[0], update_counts_by_col_id=[] | PLAIN_ENCODING | LZ4 24693592 | bloom | | 5.04M | binary | | PLAIN_ENCODING | NO_COMPRESSION 24693593 | adhoc-index | | 8.94M | binary | | PREFIX_ENCODING | LZ4 Kind Regards, Sergejs Andrejevs