You mean  select a,b from a inner join b on (a.id=b.id) ? or Does those
brackets make some difference? Because the inner keyword is no where
mentioned in the language manual
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins

Any hints?




On Fri, Oct 21, 2011 at 8:47 PM, Edward Capriolo <edlinuxg...@gmail.com>wrote:

>
>
> On Fri, Oct 21, 2011 at 10:21 AM, john smith <js1987.sm...@gmail.com>wrote:
>
>> Hi Edward,
>>
>> Thanks for replying. I have been using the query
>>
>> "select a,b from a,b where a.id=b.id ".  According to my knowledge of
>> Hive, it reads data of both A and B and emits <join_key,rowid/required row
>> data> pairs as map outputs and then performs cartesian joins on reduce side
>> for the same join_keys .
>>
>> Is this the cartesian join you are referring to? or Is it the cartesian
>> product of the total table (as in sql) ? or Am I missing something?
>>
>> Can you please throw some light on the functionality of mapred.mode=strict
>> ?
>>
>> Thanks,
>> jS
>>
>> On Fri, Oct 21, 2011 at 7:29 PM, Edward Capriolo 
>> <edlinuxg...@gmail.com>wrote:
>>
>>>
>>>
>>> On Fri, Oct 21, 2011 at 9:22 AM, john smith <js1987.sm...@gmail.com>wrote:
>>>
>>>> Hi list,
>>>>
>>>> I am also facing the same problem. My reducers hang at this position and
>>>> it takes hours to complete a single reduce task. Can any hive guru help us
>>>> out with this issue.
>>>>
>>>> Thanks,
>>>> jS
>>>>
>>>> 2011/10/21 bangbig <lizhongliangg...@163.com>
>>>>
>>>>> HI all,
>>>>>
>>>>> HIVE runs too slowly when it is doing such things(see the log below), 
>>>>> what's the problem? because I'm joining two large table?
>>>>>
>>>>> it runs pretty fast at first. when the job finishes 95%, it begins to 
>>>>> slow down.
>>>>>
>>>>> --------------------------------------------------
>>>>>
>>>>> INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 forwarding 1044000000 
>>>>> rows
>>>>> 2011-10-21 16:55:57,427 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 
>>>>> 4 forwarding 1045000000 rows
>>>>> 2011-10-21 16:55:57,545 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 
>>>>> 4 forwarding 1046000000 rows
>>>>> 2011-10-21 16:55:57,686 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 
>>>>> 4 forwarding 1047000000 rows
>>>>> 2011-10-21 16:55:57,806 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 
>>>>> 4 forwarding 1048000000 rows
>>>>> 2011-10-21 16:55:57,926 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 
>>>>> 4 forwarding 1049000000 rows
>>>>> 2011-10-21 16:55:58,045 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 
>>>>> 4 forwarding 1050000000 rows
>>>>> 2011-10-21 16:55:58,164 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 
>>>>> 4 forwarding 1051000000 rows
>>>>> 2011-10-21 16:55:58,284 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 
>>>>> 4 forwarding 1052000000 rows
>>>>> 2011-10-21 16:55:58,405 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 
>>>>> 4 forwarding 1053000000 rows
>>>>> 2011-10-21 16:55:58,525 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 
>>>>> 4 forwarding 1054000000 rows
>>>>> 2011-10-21 16:55:58,644 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 
>>>>> 4 forwarding 1055000000 rows
>>>>> 2011-10-21 16:55:58,764 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 
>>>>> 4 forwarding 1056000000 rows
>>>>> 2011-10-21 16:55:58,883 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 
>>>>> 4 forwarding 1057000000 rows
>>>>> 2011-10-21 16:55:59,003 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 
>>>>> 4 forwarding 1058000000 rows
>>>>> 2011-10-21 16:55:59,122 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 
>>>>> 4 forwarding 1059000000 rows
>>>>> 2011-10-21 16:55:59,242 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 
>>>>> 4 forwarding 1060000000 rows
>>>>> 2011-10-21 16:55:59,361 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 
>>>>> 4 forwarding 1061000000 rows
>>>>> 2011-10-21 16:55:59,482 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 
>>>>> 4 forwarding 1062000000 rows
>>>>> 2011-10-21 16:55:59,601 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 
>>>>> 4 forwarding 1063000000 rows
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>> It is hard to say without seeing the query, the table definition, and the
>>> explain. Please send the query. Although I have a theory:
>>>
>>> This query is not good:
>>> select a,b from a,b where a.id=b.id
>>> It does a Cart join.
>>>
>>> This query is better.
>>> select a,b from a inner join b on (a.id=b.id)
>>>
>>> Consider setting in your hive-site.xml
>>>
>>> hive.mapred.mode=strict
>>>
>>> It can prevent you from running dangerous queries.
>>>
>>>
>>
> To be clear:
>
> Do NOT join this way (it results in a cartesian product):
>
> select a,b from a,b where a.id=b.id
>
> Join this way:
>
> select a,b from a join b on (a.id=b.id)
>
> Also:
> set hive.mapred.mode=strict in your hive-site.xml to prevent yourself from
> mistakenly doing cartesian products and other bad ideas.
>

Reply via email to