Hi, On running the following query I am getting multiple records with same value of F1
SELECT F1, COUNT(*) FROM ( SELECT F1, F2, COUNT(*) FROM TABLE1 GROUP BY F1, F2 ) a GROUP BY F1; As per what I understand there are multiple number of records based on number of reducers. Replicating the test scenario: STEP1: get the dataset as available in http://snap.stanford.edu/data/amazon0302.html STEP2: Open the file and delete the heading STEP3: hadoop fs -mkdir /test STEP4: hadoop fs -put amazon0302.txt /test STEP5: create external table test (f1 int, f2 int) row format delimited fields terminated by '\t' lines terminated by '\n' stored as textfile location '/test'; STEP6: create table test1 location '/test1' as select left_table.* from (select * from test where f1<10000) left_table join (select * from test where f1 < 10000) right_table; STEP7: hadoop fs -mkdir /test2 STEP8: create table test2 location '/test2' as select f1, count(*) from (select f1, f2, count(*) from test1 group by f1, f2) a group by f1; STEP9: select * from test2 where f1 = 9887; ENVIRONMENT: HADOOP 2.0.4 HIVE 0.11 Please do let me know whether I am doing anything wrong. Thanks and Regards, Gourav Sengupta (PS: my apologies for having posted this issue earlier by mistake to the Pig user mailing list)