(answering again but now including the mailing list :P)

Thank you for your answer Suraj.

What you said is exactly what I expect, but I get something different.

Using your example (the specific data is not important here) I get in my
UDF more than one key ordered. Here's a sample of the code of my UDF:


DataBag bag = DataType.toBag(input.get(0));

Iterator it = bag.iterator();

while (it.hasNext()) {
 Tuple t = (Tuple)it.next();
//Here I print the attribute used as the grouping key
}


What I get in the output is:

key1
key1
key1
key2

The point is that I'm using test data that are not really big (less than
64MB). Anyhow, Pig shouldn't put these keys together in the same bag! Maybe
this a kind of optimization that I should turn off.


2014-07-12 23:29 GMT+02:00 Suraj Nayak <[email protected]>:

> Are you processing the bag in the UDF?
>
> Can you send sample records which is going in to UDF using dump command
> for alias C?
>
> If the data is(alias A)
> (key1,d1,d2)
> (key2,d1,d3)
> (key1,d1,d4)
> (key1,d1,d5)
>
> On grouping on 1st column the data should be grouped as below
>
> {(key1),{ (key1,d1,d2), (key1,d1,d4), (key1,d1,d5) }}
> {(key2),{ (key2,d1,d3) }}
>
> If you are providing the data A to UDF you should get all records with
> respect to same key in same bag.
>
> --
> Suraj Nayak
>

Reply via email to