Hi, This is really a hive question & hopefully you can follow up on this on the hive user@ mailing lists.
But since you¹re looking at Hive-on-Tez, this issue seems familiar to me. > "insert overwrite table 2h2 partition (dt) select *,TIME_STAMP from >2h_tmp;" > > Tez alloactes only one reducer to the job which results in a 6 hour run. That doesn¹t look like it needs a reducer in normal cases. Is the destination table bucketed into 1 bucket? > Is this related to https://issues.apache.org/jira/browse/HIVE-7158 , it >is marked as resolved for hive 0.14 No, it is not. This might be related to a featured turned off by default in HDP-2.2. If you have >1 partition in the dynamic partitioned insert, the feature you need is in HIVE-6455 + HIVE-6761. set hive.optimize.sort.dynamic.partition=true; This is off by default, since it slows down ETL where the destination is exactly 1 partition. I keep updating the hive-testbench to do the right thing (because it does both TPC-DS and TPC-H), so those settings might be of help https://github.com/hortonworks/hive-testbench/blob/hive14/settings/load-par titioned.sql#L10 Cheers, Gopal
