You mean select a,b from a inner join b on (a.id=b.id) ? or Does those brackets make some difference? Because the inner keyword is no where mentioned in the language manual https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins
Any hints? On Fri, Oct 21, 2011 at 8:47 PM, Edward Capriolo <edlinuxg...@gmail.com>wrote: > > > On Fri, Oct 21, 2011 at 10:21 AM, john smith <js1987.sm...@gmail.com>wrote: > >> Hi Edward, >> >> Thanks for replying. I have been using the query >> >> "select a,b from a,b where a.id=b.id ". According to my knowledge of >> Hive, it reads data of both A and B and emits <join_key,rowid/required row >> data> pairs as map outputs and then performs cartesian joins on reduce side >> for the same join_keys . >> >> Is this the cartesian join you are referring to? or Is it the cartesian >> product of the total table (as in sql) ? or Am I missing something? >> >> Can you please throw some light on the functionality of mapred.mode=strict >> ? >> >> Thanks, >> jS >> >> On Fri, Oct 21, 2011 at 7:29 PM, Edward Capriolo >> <edlinuxg...@gmail.com>wrote: >> >>> >>> >>> On Fri, Oct 21, 2011 at 9:22 AM, john smith <js1987.sm...@gmail.com>wrote: >>> >>>> Hi list, >>>> >>>> I am also facing the same problem. My reducers hang at this position and >>>> it takes hours to complete a single reduce task. Can any hive guru help us >>>> out with this issue. >>>> >>>> Thanks, >>>> jS >>>> >>>> 2011/10/21 bangbig <lizhongliangg...@163.com> >>>> >>>>> HI all, >>>>> >>>>> HIVE runs too slowly when it is doing such things(see the log below), >>>>> what's the problem? because I'm joining two large table? >>>>> >>>>> it runs pretty fast at first. when the job finishes 95%, it begins to >>>>> slow down. >>>>> >>>>> -------------------------------------------------- >>>>> >>>>> INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 forwarding 1044000000 >>>>> rows >>>>> 2011-10-21 16:55:57,427 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: >>>>> 4 forwarding 1045000000 rows >>>>> 2011-10-21 16:55:57,545 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: >>>>> 4 forwarding 1046000000 rows >>>>> 2011-10-21 16:55:57,686 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: >>>>> 4 forwarding 1047000000 rows >>>>> 2011-10-21 16:55:57,806 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: >>>>> 4 forwarding 1048000000 rows >>>>> 2011-10-21 16:55:57,926 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: >>>>> 4 forwarding 1049000000 rows >>>>> 2011-10-21 16:55:58,045 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: >>>>> 4 forwarding 1050000000 rows >>>>> 2011-10-21 16:55:58,164 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: >>>>> 4 forwarding 1051000000 rows >>>>> 2011-10-21 16:55:58,284 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: >>>>> 4 forwarding 1052000000 rows >>>>> 2011-10-21 16:55:58,405 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: >>>>> 4 forwarding 1053000000 rows >>>>> 2011-10-21 16:55:58,525 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: >>>>> 4 forwarding 1054000000 rows >>>>> 2011-10-21 16:55:58,644 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: >>>>> 4 forwarding 1055000000 rows >>>>> 2011-10-21 16:55:58,764 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: >>>>> 4 forwarding 1056000000 rows >>>>> 2011-10-21 16:55:58,883 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: >>>>> 4 forwarding 1057000000 rows >>>>> 2011-10-21 16:55:59,003 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: >>>>> 4 forwarding 1058000000 rows >>>>> 2011-10-21 16:55:59,122 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: >>>>> 4 forwarding 1059000000 rows >>>>> 2011-10-21 16:55:59,242 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: >>>>> 4 forwarding 1060000000 rows >>>>> 2011-10-21 16:55:59,361 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: >>>>> 4 forwarding 1061000000 rows >>>>> 2011-10-21 16:55:59,482 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: >>>>> 4 forwarding 1062000000 rows >>>>> 2011-10-21 16:55:59,601 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: >>>>> 4 forwarding 1063000000 rows >>>>> >>>>> >>>>> >>>>> >>>> >>> It is hard to say without seeing the query, the table definition, and the >>> explain. Please send the query. Although I have a theory: >>> >>> This query is not good: >>> select a,b from a,b where a.id=b.id >>> It does a Cart join. >>> >>> This query is better. >>> select a,b from a inner join b on (a.id=b.id) >>> >>> Consider setting in your hive-site.xml >>> >>> hive.mapred.mode=strict >>> >>> It can prevent you from running dangerous queries. >>> >>> >> > To be clear: > > Do NOT join this way (it results in a cartesian product): > > select a,b from a,b where a.id=b.id > > Join this way: > > select a,b from a join b on (a.id=b.id) > > Also: > set hive.mapred.mode=strict in your hive-site.xml to prevent yourself from > mistakenly doing cartesian products and other bad ideas. >