Hi Adam,

Pig doesn't reorder arrays when loading them. But when you do group-by, the
order of bags on the reducer side is not deterministic. So for example, if
you do "limit n" in a nested foreach after group-by, you can get different
results every run.

Thanks,
Cheolsoo



On Thu, Jun 19, 2014 at 11:56 AM, Adam Silberstein <[email protected]>
wrote:

> Hey All,
> I have a question about Pig’s guarantees around the order of tuples in
> bags.  I am trying to decide how paranoid to be about this.
>
> Documentation says that bags are unordered.  But, in practice, I have
> never seen Pig re-order the tuples in a default data bag and nothing about
> the current implementation suggests they can get out of order.
>
> Also, if you look at PigAvroStorage or JsonStorage (at least the
> elephant-bird version), both read in arrays as bags.  Does that mean they
> implicitly don’t care about maintaining order in arrays?  Or are they
> counting on the current implementation to keep them in order.
>
> Thanks for any insights on this!
> Adam
>
>
>

Reply via email to