Unsubscribe -- Sent using a mobile device.
On Sat, Jul 19, 2014 at 7:39 PM, Cheolsoo Park <[email protected]> wrote: > Hi Adam, > Alegbraic and accumulator are mutually exclusive, so you can't use them at > the same time. > >> It seems that Pig will always bag up tuples between the input and > intermediate aggregate stages. For my workload those bags get large and > spill to disk. The spilling in and of itself seems to cause a lot of > memory pressure and then GC and slowdown. > That said, I don't understand why you're seeing bag spilling in combiners. > Bags are usually small on the mapper side and don't spill to disk because > only records that are grouped by the same by are bagged together. Bag > spilling usually happens on the reducer side. Do you have skewed keys in > your group-by? Or can't you increase the parallelism of mappers to spread > out the load across more mappers? > Thanks, > Cheolsoo > On Tue, Jul 15, 2014 at 4:53 PM, Adam Silberstein <[email protected]> wrote: >> Hey All, >> I’m struggling with performance of algebraic aggregates. It seems that >> Pig will always bag up tuples between the input and intermediate aggregate >> stages. For my workload those bags get large and spill to disk. The >> spilling in and of itself seems to cause a lot of memory pressure and then >> GC and slowdown. >> >> The aggregates I am computing are things like MAX, where I would really >> like to stream the records through the input stage and only maintain the >> current max. Is this possible with algebraic, accumulator or anything else? >> >> Thanks, >> Adam
