Ok, let me state what I think happens (from looking at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POPackage),
 and I'd be happy if someone could confirm or correct me.
It looks like no matter how many bags there are, if the accumulator is used the 
same amount of tuples are transferred for each bag, i.e., the first 
pig.accumulative.batchsize tuples, then the next, until all the bags are 
exhausted, and then getValue() will be called.

Is this right? 

    On Friday, February 26, 2016 10:47 PM, Eyal Allweil 
<eyal_allw...@yahoo.com> wrote:
 

 I asked this question on Stack Overflow, but this is a better place to ask.
What happens when a tuple with more than one bag gets sent to a UDF that 
implements Accumulator? (and the accumulator should be used) Does this mean 
that the first bag gets sent in batches, but subsequent bags are sent in their 
entirety? Or all the bags get sent in batches? Or the accumulator isn't used?
Here's a link to the question there:
http://stackoverflow.com/questions/35610426/how-does-pig-handle-tuples-with-more-than-one-bag-when-using-the-accumulator
Thanks,Eyal



   

Reply via email to