Hi Adam, Pig doesn't reorder arrays when loading them. But when you do group-by, the order of bags on the reducer side is not deterministic. So for example, if you do "limit n" in a nested foreach after group-by, you can get different results every run.
Thanks, Cheolsoo On Thu, Jun 19, 2014 at 11:56 AM, Adam Silberstein <[email protected]> wrote: > Hey All, > I have a question about Pig’s guarantees around the order of tuples in > bags. I am trying to decide how paranoid to be about this. > > Documentation says that bags are unordered. But, in practice, I have > never seen Pig re-order the tuples in a default data bag and nothing about > the current implementation suggests they can get out of order. > > Also, if you look at PigAvroStorage or JsonStorage (at least the > elephant-bird version), both read in arrays as bags. Does that mean they > implicitly don’t care about maintaining order in arrays? Or are they > counting on the current implementation to keep them in order. > > Thanks for any insights on this! > Adam > > >
