Hi,

I am facing a weird issue.Probably I am missing something.Could somebody
please guide me.

I have three tables in hive.

Table A - clientid,shopid,shopposition,shopdate,shopitemid,shopitemdescr
(partitioned by date)
Table B- shopid,shopposition (distinct shopid,shopposition from another
table)
Table C - clientid,clientname,clientcity

I would like to join the tables above to find the following information for
a date
- shopid,shopposition,shopdate,shopitemid,shopitemdescr,clientname,clientcity

I wrote something like - select <<above fields>> from A a join B b on
(a.shopid=b.shopid and a.shopposition=b.shopposition) join C
c(c.clientid=a.clientid)

The result is not what I expected and has lot of duplicates.

Whereas if I create a seperate table from the first join - create table D
as select <<relevant fields>> from A a join B b on (a.shopid=b.shopid and
a.shopposition=b.shopposition)

and then join with C - select <<fields>> from D d join C c on (
d.clientid=c.clientid) the results are as expected.

For example if Table A has 8 rows then following the second approach gives
me 8 rows with proper fields but the the former approach I get lot of rows
( probably a cartesian product)

I am confused about whats going wrong with the first approach,could
somebody throw some light please.

---------------------------
Thanks & Regards
Himanish

Reply via email to