Csaba Ringhofer created HIVE-22565: -------------------------------------- Summary: Make calling alter_table unnecessary during inserts Key: HIVE-22565 URL: https://issues.apache.org/jira/browse/HIVE-22565 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Csaba Ringhofer
tl dr: it would be good to set the table's writeId during commit to make the extra alter_table call unnecessary This came up during the implementation of (insert_only) ACID inserts in Apache Impala. The following description deals with the non-partitioned case, partitioned tables are a bit more complicated. apply_table is called by Impala during inserts mainly to set stats to non-accurate: - the table's writeId is set to the writeId of the insert - remove table property column_stats_accurate In the past we had the false assumption that setting the writeId is done automatically by committing the transaction. It would be nice to have a version of commit that actually does this - commits the transaction + changes the writeId/marks stats as inaccurate in a single atomic step. The current state of alter_table + commit being non-atomic can lead to weird scenarios in parallel inserts(+ computes stats). Impala calls apply_table before commit, so the calls to HMS during inserts look like this: 1. open new transaction 2. get shared lock on the table 3. get write id ... write the files ... 4. call alter_table to remove column_stats_accurate (this also sets writeId) 5. commit the transaction So the following can occur with two parallel writes + a compute stats: 1. txn 1 calls alter_table (sets to writeId of txn 1) 2. txn 2 calls alter_table (sets to writeId of txn 2) 3. txn 2 is committed 4. compute stats runs (gets validWriteList, reads the table, sets the stats with alter_table) 5. txn 1 is committed The compute stats will have the writeId of txn 2 in it's validWriteId list, so it will assume that it computed accurate stats. After step 5. the stats will be considered accurate while they do not contain the new rows from txn 1. Another issue with frequent alter_table calls is that the effect of actual ALTER TABLE commands that use shared locks (I think SET TBLPROPERTIES does this in Hive) can be simply overwritten by alter_table calls from inserts that used a different cached version of the table. This is generally a problem if ALTER TABLE is called from different clients (without taking exclusive lock), but doing parallel DMLs is probably more common than doing parallel DDLs. So issues can occur even if clients use the API correctly - another problem is that the hard to use API may lead to buggy client implementation that can easily mess up things for other components too. -- This message was sent by Atlassian Jira (v8.3.4#803005)