[ https://issues.apache.org/jira/browse/HIVE-24870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
László Bodor reassigned HIVE-24870: ----------------------------------- Assignee: (was: László Bodor) > Metastore: cleanup unused column descriptors asynchronously in batches > ---------------------------------------------------------------------- > > Key: HIVE-24870 > URL: https://issues.apache.org/jira/browse/HIVE-24870 > Project: Hive > Issue Type: Improvement > Reporter: László Bodor > Priority: Major > > HIVE-2246 introduces CD_ID for optimizing metastore db (details there). > ObjectStore.removeUnusedColumnDescriptor is a maintenance task that is called > in every alter partition kind of operation. During a replication, > alterPartition could be a heavy path, and has no direct advantage of running > removeUnusedColumnDescriptor immediately. Moreover, there is a > {code} > select count(*) from "SDS" where "CD_ID"=12345; > {code} > kind of query in it, which can take a relatively long time compared to alter > partition. > https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L4982 > {code} > query = pm.newQuery("select count(1) from " + > "org.apache.hadoop.hive.metastore.model.MStorageDescriptor where > (this.cd == inCD)"); > query.declareParameters("MColumnDescriptor inCD"); > long count = ((Long)query.execute(oldCD)).longValue(); > //if no other SD references this CD, we can throw it out. > if (count == 0) { > {code} > My proposal is to run this in a batched way, in every configurable amount of > seconds/minutes/whatever. -- This message was sent by Atlassian Jira (v8.3.4#803005)