Hi, Wolli:
Cross join doesn't mean Hive has to use one reduce.
>From query point of view, the following cases will use one reducer:
1) Order by in your query (Instead of using sort by)2) Only one reducer group, 
which means all the data have to send to one reducer, as there is only one 
reducer group.
In your case, distinct count of id1 will be the reducer group count. Did you 
explicitly set the reducer count in your hive session?
Yong 
Date: Wed, 5 Mar 2014 14:17:24 +0100
Subject: Best way to avoid cross join
From: darkwoll...@gmail.com
To: user@hive.apache.org

Hey everyone,
before i write a lot of text, i just post something which is already 
written:http://www.sqlservercentral.com/Forums/Topic1328496-360-1.aspx


The first posts adresses a pretty similar problem i also have. Currently my 
implementation looks like this:
SELECT id1,
  MAX(  CASE    WHEN m.keyword IS NULL    THEN 0
    WHEN instr(m.keyword, prep_kw.keyword) > 0    THEN 1
    ELSE 0  END) AS flagFROM (select id1, keyword from import1) m
CROSS JOIN  (SELECT keyword FROM et_keywords) prep_kw
GROUP BY id1;
Since there is a cross join involved, the execution gets pinned down to 1 
reducer only and it takes ages to complete. 

The thread i posted is solving this with some special SQLserver tactics. But I 
was wondering if anybody has encountered the problem in Hive already and found 
a better way to solve this.

I'm using Hive 0.11 on a MapR Distribution, if this is somehow important.
CheersWolli                                       

Reply via email to