Hey Mohammad,

Here
c = TOP(5,3,a);
you say: take 5 records out of a that have the biggest values in the third
column. Do you really need that sorting by the third column?

-----Original Message-----
From: Mohammad Tariq [mailto:donta...@gmail.com] 
Sent: Monday, May 21, 2012 3:54 PM
To: user@pig.apache.org
Subject: How to use TOP?

Hello list,

  I have an Hdfs file that has 6 columns that contain some data stored in an
Hbase table.the data looks like this -

18.98   2000             1.21   193.46  2.64        58.17
52.49   2000.5   4.32           947.11  2.74        64.45
115.24  2001             16.8   878.58  2.66        94.49
55.55   2001.5   33.03  656.56  2.82        60.76
156.14  2002             35.52  83.75   2.6         59.57
138.77  2002.5   21.51  105.76  2.62        85.89
71.89   2003             27.79  709.01  2.63        85.44
59.84   2003.5   32.1           444.82  2.72        70.8
103.18  2004             4.09   413.15  2.8         54.37

Now I have to take each record along with its next 4 records and do some
processing(for example, in the first shot I have to take records 1-5, in the
next shot I have to take 2-6 and so on)..I am trying to use TOP for this,
but getting the following error -

2012-05-21 17:04:30,328 [main] ERROR org.apache.pig.tools.grunt.Grunt
- ERROR 1200: Pig script failed to parse:
<line 6, column 37> Invalid scalar projection: parameters : A column needs
to be projected from a relation for it to be used as a scalar Details at
logfile: /home/mohammad/pig-0.9.2/logs/pig_1337599211281.log

I am using following commands -

grunt> a = load 'hbase://logdata'
>> using org.apache.pig.backend.hadoop.hbase.HBaseStorage(
>> 'cf:DGR cf:HD cf:POR cf:RES cf:RHOB cf:SON', '-loadKey true') as (id, 
>> DGR, HD, POR, RES, RHOB, SON);
grunt> b = foreach a { c = TOP(5,3,a);
>> generate flatten(c);
>> }

Could anyone tell me how to achieve that????Many thanks.

Regards,
    Mohammad Tariq

Reply via email to