Unsubscribe 

--
Sent using a mobile device.

On Sat, Jul 19, 2014 at 7:39 PM, Cheolsoo Park <[email protected]>
wrote:

> Hi Adam,
> Alegbraic and accumulator are mutually exclusive, so you can't use them at
> the same time.
>  >> It seems that Pig will always bag up tuples between the input and
> intermediate aggregate stages. For my workload those bags get large and
> spill to disk.  The spilling in and of itself seems to cause a lot of
> memory pressure and then GC and slowdown.
> That said, I don't understand why you're seeing bag spilling in combiners.
> Bags are usually small on the mapper side and don't spill to disk because
> only records that are grouped by the same by are bagged together. Bag
> spilling usually happens on the reducer side. Do you have skewed keys in
> your group-by? Or can't you increase the parallelism of mappers to spread
> out the load across more mappers?
> Thanks,
> Cheolsoo
> On Tue, Jul 15, 2014 at 4:53 PM, Adam Silberstein <[email protected]> wrote:
>> Hey All,
>> I’m struggling with performance of algebraic aggregates.  It seems that
>> Pig will always bag up tuples between the input and intermediate aggregate
>> stages.  For my workload those bags get large and spill to disk.  The
>> spilling in and of itself seems to cause a lot of memory pressure and then
>> GC and slowdown.
>>
>> The aggregates I am computing are things like MAX, where I would really
>> like to stream the records through the input stage and only maintain the
>> current max.  Is this possible with algebraic, accumulator or anything else?
>>
>> Thanks,
>> Adam

Reply via email to