Hi,

This is really a hive question & hopefully you can follow up on this on
the hive user@ mailing lists.

But since you¹re looking at Hive-on-Tez, this issue seems familiar to me.

> "insert overwrite table 2h2 partition (dt) select *,TIME_STAMP  from
>2h_tmp;"
> 
> Tez alloactes only one reducer to the job which results in a 6 hour run.

That doesn¹t look like it needs a reducer in normal cases.

Is the destination table bucketed into 1 bucket?

> Is this related to https://issues.apache.org/jira/browse/HIVE-7158 , it
>is marked as resolved for hive 0.14

No, it is not.

This might be related to a featured turned off by default in HDP-2.2.

If you have >1 partition in the dynamic partitioned insert, the feature
you need is in HIVE-6455 + HIVE-6761.


set hive.optimize.sort.dynamic.partition=true;


This is off by default, since it slows down ETL where the destination is
exactly 1 partition.

I keep updating the hive-testbench to do the right thing (because it does
both TPC-DS and TPC-H), so those settings might be of help

https://github.com/hortonworks/hive-testbench/blob/hive14/settings/load-par
titioned.sql#L10


Cheers,
Gopal


Reply via email to