subject:"Re\: Number of reduce tasks"

Re: Number of reduce tasks

2012-06-18 Thread Pankaj Gupta

Aniket: No, I am not using hcatalog. To follow up on this thread, I was indeed able to run multiple reduce tasks using the PARALLEL clause. Thanks everyone for helping out. Unfortunately, I ran into an out of memory error after that and I'm debugging that now (created a separate thread for adv

Re: Number of reduce tasks

2012-06-18 Thread Aniket Mokashi

Pankaj, are you using hcatalog? On Fri, Jun 1, 2012 at 5:24 PM, Prashant Kommireddi wrote: > Right. And the documentation provides a list of operations that can be > parallelized. > > On Jun 1, 2012, at 4:50 PM, Dmitriy Ryaboy wrote: > > > That being said, some operators such as "group all" and

Re: Number of reduce tasks

2012-06-01 Thread Prashant Kommireddi

Right. And the documentation provides a list of operations that can be parallelized. On Jun 1, 2012, at 4:50 PM, Dmitriy Ryaboy wrote: > That being said, some operators such as "group all" and limit, do require > using only 1 reducer, by nature. So it depends on what your script is doing. > > O

Re: Number of reduce tasks

2012-06-01 Thread Dmitriy Ryaboy

That being said, some operators such as "group all" and limit, do require using only 1 reducer, by nature. So it depends on what your script is doing. On Jun 1, 2012, at 12:26 PM, Prashant Kommireddi wrote: > Automatic Heuristic works the same in 0.9.1 > http://pig.apache.org/docs/r0.9.1/perf.

Re: Number of reduce tasks

2012-06-01 Thread Prashant Kommireddi

Automatic Heuristic works the same in 0.9.1 http://pig.apache.org/docs/r0.9.1/perf.html#parallel, but you might be better off setting it manually looking at job tracker counters. You should be fine with using PARALLEL for any of the operators mentioned on the doc. -Prashant On Fri, Jun 1, 2012

Re: Number of reduce tasks

2012-06-01 Thread Pankaj Gupta

Hi Prashant, Thanks for the tips. We haven't moved to Pig 0.10.0 yet, but seems like a very useful upgrade. For the moment though it seems that I should be able to use the 1GB per reducer heuristic and specify the number of reducers in Pig 0.9.1 by using the PARALLEL clause in the Pig script. D

Re: Number of reduce tasks

2012-06-01 Thread Pankaj Gupta

I am using Pig version 0.9.1. On Jun 1, 2012, at 11:49 AM, Jonathan Coveney wrote: > Pankaj, > > What version of pig are you using? In later versions of pig, it should have > some logic around automatically setting parallelisms (though sometimes > these heuristics will be wrong). > > There are

Re: Number of reduce tasks

2012-06-01 Thread Prashant Kommireddi

Also, please note default number of reducers are based on input dataset. In the basic case, Pig will "automatically" spawn a reducer for each GB of input, so if your input dataset size is 500 GB you should see 500 reducers being spawned (though this is excessive in a lot of cases). This document t

Re: Number of reduce tasks

2012-06-01 Thread Jonathan Coveney

Pankaj, What version of pig are you using? In later versions of pig, it should have some logic around automatically setting parallelisms (though sometimes these heuristics will be wrong). There are also some operations which will force you to use 1 reducer. It depends on what your script is doing

Re: Number of reduce tasks

Re: Number of reduce tasks

Re: Number of reduce tasks

Re: Number of reduce tasks

Re: Number of reduce tasks

Re: Number of reduce tasks

Re: Number of reduce tasks

Re: Number of reduce tasks

Re: Number of reduce tasks

9 matches

Site Navigation

Mail list logo

Footer information