Do you mean you want to find the top 5 per input record?  Also, what is your 
ordering criteria?  Just sort by id?  Something like this should order all 
tuples in each bag by id and then produce the top 5.  My syntax may be a little 
off as I'm working offline and don't have the manual in front of me, but this 
should be the general idea.

A = load 'yourinput' as (b:bag);
B = foreach A {
        B1 = order A by $0; -- order on the id
        B2 = limit B1 5;
        generate flatten(B2);
}

Alan.

On Nov 5, 2013, at 9:52 AM, Sameer Tilak wrote:

> Hi Pig experts,
> Sorry to post so many questions, I have one more question on doing some 
> analytics on bag of tuples.
> 
> My input has the following format:
> 
> {(id1,x,y,z), (id2, a, b, c), (id3,x,a)}  /* User 1 info */
> {(id10,x,y,z), (id9, a, b, c), (id1,x,a)} /* User 2 info */
> {(id8,x,y,z), (id4, a, b, c), (id2,x,a)} /* User 3 info */
> {(id6,x,y,z), (id6, a, b, c), (id9,x,a)} /* User 4 info */
> 
> I can change my UDF to give more simple output. However, I want to find out 
> if something like this can be done easily:
> I would like to find out top 5 ids (field 1 in a tuple) among all the users. 
> Note that each user has a bag and the first field of each tuple in that bag 
> is id. 
> 
> How difficult will it be to filter based on fields of tuples and do analytics 
> across the entire user base.
>                                         


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Reply via email to