> Dimensions change, and I'd rather do update than recreate a snapshot. Slow changing dimensions are the common use-case for Hive's ACID MERGE.
The feature you need is most likely covered by https://issues.apache.org/jira/browse/HIVE-10924 2nd comment from that JIRA "Once an hour, a set of inserts and updates (up to 500k rows) for various dimension tables (eg. customer, inventory, stores) needs to be processed. The dimension tables have primary keys and are typically bucketed and sorted on those keys." Any other approach would need a full snapshot re-materialization, because ACID can generate DELETE + INSERT instead of rewriting the original file for a 2% upsert. If you do not have any isolation concerns (as in, a query doing a read when 50% of your update has applied), using HBase backed dimension tables in Hive is possible, but it does not offer the same transactional consistency as the ACID merge will. Cheers, Gopal