Thanks Gopal, I dont want to divide my data any further. Isn't there a way to make hive allocate more than one reducer for the whole job? Maybe one per partition.
Daniel > On 7 בדצמ׳ 2014, at 06:06, Gopal V <[email protected]> wrote: > >> On 12/6/14, 6:27 AM, Daniel Haviv wrote: >> Hi, >> I'm executing an insert statement that goes over 1TB of data. >> The map phase goes well but the reduce stage only used one reducer which >> becomes a great bottleneck. > > Are you inserting into a bucketed or sorted table? > > If the destination table is bucketed + partitioned, you can use the dynamic > partition sort optimization to get beyond the single reducer. > > Cheers, > Gopal
