Hi, I am facing a weird issue.Probably I am missing something.Could somebody please guide me.
I have three tables in hive. Table A - clientid,shopid,shopposition,shopdate,shopitemid,shopitemdescr (partitioned by date) Table B- shopid,shopposition (distinct shopid,shopposition from another table) Table C - clientid,clientname,clientcity I would like to join the tables above to find the following information for a date - shopid,shopposition,shopdate,shopitemid,shopitemdescr,clientname,clientcity I wrote something like - select <<above fields>> from A a join B b on (a.shopid=b.shopid and a.shopposition=b.shopposition) join C c(c.clientid=a.clientid) The result is not what I expected and has lot of duplicates. Whereas if I create a seperate table from the first join - create table D as select <<relevant fields>> from A a join B b on (a.shopid=b.shopid and a.shopposition=b.shopposition) and then join with C - select <<fields>> from D d join C c on ( d.clientid=c.clientid) the results are as expected. For example if Table A has 8 rows then following the second approach gives me 8 rows with proper fields but the the former approach I get lot of rows ( probably a cartesian product) I am confused about whats going wrong with the first approach,could somebody throw some light please. --------------------------- Thanks & Regards Himanish
