Re: UDF Performance Problem

2012-09-03 Thread James Newhaven
Thanks Dmitriy, all sorted now. James On Mon, Sep 3, 2012 at 6:21 PM, Dmitriy Ryaboy wrote: > That's cause you used "group all" which groups everything into one > group, which by definition can only go to one reducer. > > What if instead you group into some large-enough number of buckets? > > A

Re: UDF Performance Problem

2012-09-03 Thread Dmitriy Ryaboy
That's cause you used "group all" which groups everything into one group, which by definition can only go to one reducer. What if instead you group into some large-enough number of buckets? A = LOAD 'records.txt' USING PigStorage('\t') AS (recordId:int); A_PRIME = FOREACH A generate *, ROUND(RAN