Hey Mohammad, Here c = TOP(5,3,a); you say: take 5 records out of a that have the biggest values in the third column. Do you really need that sorting by the third column?
-----Original Message----- From: Mohammad Tariq [mailto:donta...@gmail.com] Sent: Monday, May 21, 2012 3:54 PM To: user@pig.apache.org Subject: How to use TOP? Hello list, I have an Hdfs file that has 6 columns that contain some data stored in an Hbase table.the data looks like this - 18.98 2000 1.21 193.46 2.64 58.17 52.49 2000.5 4.32 947.11 2.74 64.45 115.24 2001 16.8 878.58 2.66 94.49 55.55 2001.5 33.03 656.56 2.82 60.76 156.14 2002 35.52 83.75 2.6 59.57 138.77 2002.5 21.51 105.76 2.62 85.89 71.89 2003 27.79 709.01 2.63 85.44 59.84 2003.5 32.1 444.82 2.72 70.8 103.18 2004 4.09 413.15 2.8 54.37 Now I have to take each record along with its next 4 records and do some processing(for example, in the first shot I have to take records 1-5, in the next shot I have to take 2-6 and so on)..I am trying to use TOP for this, but getting the following error - 2012-05-21 17:04:30,328 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: Pig script failed to parse: <line 6, column 37> Invalid scalar projection: parameters : A column needs to be projected from a relation for it to be used as a scalar Details at logfile: /home/mohammad/pig-0.9.2/logs/pig_1337599211281.log I am using following commands - grunt> a = load 'hbase://logdata' >> using org.apache.pig.backend.hadoop.hbase.HBaseStorage( >> 'cf:DGR cf:HD cf:POR cf:RES cf:RHOB cf:SON', '-loadKey true') as (id, >> DGR, HD, POR, RES, RHOB, SON); grunt> b = foreach a { c = TOP(5,3,a); >> generate flatten(c); >> } Could anyone tell me how to achieve that????Many thanks. Regards, Mohammad Tariq