Hi,

On running the following query I am getting multiple records with same
value of F1

SELECT F1, COUNT(*)
FROM
(
SELECT F1, F2, COUNT(*)
FROM TABLE1
GROUP BY F1, F2
) a
GROUP BY F1;

As per what I understand there are multiple number of records based on
number of reducers.

Replicating the test scenario:
STEP1: get the dataset as available in
http://snap.stanford.edu/data/amazon0302.html

STEP2: Open the file and delete the heading

STEP3: hadoop fs -mkdir /test

STEP4: hadoop fs -put amazon0302.txt /test

STEP5: create external table test (f1 int, f2 int) row format delimited
fields terminated by '\t' lines terminated by '\n' stored as textfile
location '/test';

STEP6: create table test1 location '/test1' as select left_table.* from
(select * from test where f1<10000) left_table join (select * from test
where f1 < 10000) right_table;

STEP7: hadoop fs -mkdir /test2

STEP8: create table test2 location '/test2' as select f1, count(*) from
(select f1, f2, count(*) from test1 group by f1, f2) a group by f1;

STEP9: select * from test2 where f1 = 9887;

ENVIRONMENT:
HADOOP 2.0.4
HIVE 0.11

Please do let me know whether I am doing anything wrong.


Thanks and Regards,
Gourav Sengupta

(PS: my apologies for having posted this issue earlier by mistake to the
Pig user mailing list)

Reply via email to