[ https://issues.apache.org/jira/browse/HIVE-26704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Denys Kuzmenko reassigned HIVE-26704: ------------------------------------- Assignee: Denys Kuzmenko > Cleaner shouldn't be blocked by global min open txnId > ----------------------------------------------------- > > Key: HIVE-26704 > URL: https://issues.apache.org/jira/browse/HIVE-26704 > Project: Hive > Issue Type: Task > Reporter: Denys Kuzmenko > Assignee: Denys Kuzmenko > Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > *Single transaction blocks cluster-wide Cleaner operations* > Currently, if there is a single long-running transaction that can prevent the > Cleaner to clean up any tables. This causes file buildup in tables, which can > cause performance penalties when listing the directories (note that the > compaction is not blocked by this, so unnecessary data is not read, but the > files remain there which causes performance penalty). > We can reduce the protected files from the open transaction if we have > query-table correlation data stored in the backend DB, but this change will > need the current method of recording that detail to be revisited. > The naive and somewhat backward-compatible approach is to capture the > minOpenWriteIds per table. It involves a non-mutation operation (as in, there > is no need for the HMS DB to wait for another user’s operation to record it). > This does spew data writes into the HMS backend DB, but this is a blind > insert operation that can be group-committed across many users. -- This message was sent by Atlassian Jira (v8.20.10#820010)