I am using Pig version 0.9.1. On Jun 1, 2012, at 11:49 AM, Jonathan Coveney wrote:
> Pankaj, > > What version of pig are you using? In later versions of pig, it should have > some logic around automatically setting parallelisms (though sometimes > these heuristics will be wrong). > > There are also some operations which will force you to use 1 reducer. It > depends on what your script is doing. > > 2012/6/1 Pankaj Gupta <[email protected]> > >> Hi, >> >> I just realized that one of my large scale pig jobs that has 100K map jobs >> actually only has one reduce task. Reading the documentation I see that the >> number of reduce tasks is defined by the PARALLEL clause whose default >> value is 1. I have a few questions around this: >> >> # Why is the default value of reduce tasks 1? >> # (Related to first question) Why aren't reduce tasks parallelized >> automatically in Pig? >> # How do I choose a good value of reduce tasks for my pig jobs? >> >> Thanks in Advance, >> Pankaj
