I am using Pig version 0.9.1.

On Jun 1, 2012, at 11:49 AM, Jonathan Coveney wrote:

> Pankaj,
> 
> What version of pig are you using? In later versions of pig, it should have
> some logic around automatically setting parallelisms (though sometimes
> these heuristics will be wrong).
> 
> There are also some operations which will force you to use 1 reducer. It
> depends on what your script is doing.
> 
> 2012/6/1 Pankaj Gupta <[email protected]>
> 
>> Hi,
>> 
>> I just realized that one of my large scale pig jobs that has 100K map jobs
>> actually only has one reduce task. Reading the documentation I see that the
>> number of reduce tasks is defined by the PARALLEL clause whose default
>> value is 1. I have a few questions around this:
>> 
>> # Why is the default value of reduce tasks 1?
>> # (Related to first question) Why aren't reduce tasks parallelized
>> automatically in Pig?
>> # How do I choose a good value of reduce tasks for my pig jobs?
>> 
>> Thanks in Advance,
>> Pankaj

Reply via email to