On 12/6/14, 6:27 AM, Daniel Haviv wrote:
Hi, I'm executing an insert statement that goes over 1TB of data. The map phase goes well but the reduce stage only used one reducer which becomes a great bottleneck.
Are you inserting into a bucketed or sorted table?If the destination table is bucketed + partitioned, you can use the dynamic partition sort optimization to get beyond the single reducer.
Cheers, Gopal
