Re: Insert into dynamic partitions performance

Gopal V Sat, 06 Dec 2014 23:11:29 -0800

On 12/6/14, 10:11 PM, Daniel Haviv wrote:

Isn't there a way to make hive allocate more than one reducer for the whole 
job? Maybe one
per partition.


Yes.

hive.optimize.sort.dynamic.partition=true; does nearly that.

It raises the net number of useful reducers to total-num-of-partitions xtotal-num-buckets.

If you have say, data being written into six hundred partitions with 1bucket each, it can use anywhere between 1 and 600 reducers (hashcodecollisions causing skews, of course).

It's turned off by default, because it really slows down the 1 partitionwithout buckets insert speed.


Cheers,
Gopal

On 7 בדצמ׳ 2014, at 06:06, Gopal V <gop...@apache.org> wrote:

On 12/6/14, 6:27 AM, Daniel Haviv wrote:
Hi,
I'm executing an insert statement that goes over 1TB of data.
The map phase goes well but the reduce stage only used one reducer which becomes

a great bottleneck.


Are you inserting into a bucketed or sorted table?

If the destination table is bucketed + partitioned, you can use the dynamic 
partition

sort optimization to get beyond the single reducer.


Cheers,
Gopal

Re: Insert into dynamic partitions performance

Reply via email to