Hi Bejoy,
  I am joining two tables which are both bucketed 64 ways and i want to do
a bucketed map join on them. I set the flag set hive.optimize.bucketmapjoin
= true;. The auto.convert.join is always false on our cluster. When i run
the following query:

select /*+ MAPJOIN(b) */ a.visitor_id FROM
amit_merchinteraction a join amit_dse_test_cell_allocation_f b
ON a.visitor_id == b.account_id
where a.country_id = 1 and a.dateint <= 20120322 and a.dateint >= 20120315
;

Hive sequentially creates a hash map using the contents of the mapjoin
table b, on the client , one at a time. Is that expeced behaviour? Should
it not create these hash maps on the corresponding mappers in parallel?

Thanks,
Amit

On Thu, Jan 19, 2012 at 9:22 AM, Bejoy Ks <bejoy...@yahoo.com> wrote:

> Hi Avrila
>        AFAIK the bucketed map join is not default in hive and it happens
> only when the values is set to true. It could be because the same value is
> already set in the hive configuration xml file. To cross confirm the same
> could you explicitly set this to false
> (set hive.optimize.bucketmapjoin = false;)and get the query execution
> plan from explain command.
>
> Please some pointers in line
>
> 1. Should I see sth different in the explain extended output if I set and
> unset the hive.optimize.bucketmapjoin option?
> [Bejoy] you should be seeing the same
> Try EXPLAIN your join query after setting this
> set hive.optimize.bucketmapjoin = false;
>
> 2. Should I see something different in the output of hive while running
> the query if again I set and unset the hive.optimize.bucketmapjoin?
> [Bejoy] No,Hive output should be the same. What ever is the execution plan
> for an join, optimally the end result should be same.
>
> 3. Is it possible that even though I set bucketmapjoin to true, Hive will
> still perform a normal map-side join for some reason? How can I check if
> this has actually happened?
> [Bejoy] Hive would perform a plain map side join only if the following
> parameter is enabled. (default it is disabled)
> set hive.auto.convert.join = true; you need to check this value in your
> configurations.
> If it is enabled irrespective of the table size hive would always try a
> map join, it would come to a normal join only after the map join attempt
> fails.
> AFAIK, if the number of buckets are same or multiples between the two
> tables involved in a join and if the join is on the same columns that are
> bucketed, with bucketmapjoin enabled it shouldn't execute a plain mapside
> join a bucketed map side join would be triggered.
>
> Hope it helps!..
>
> Regards
> Bejoy.K.S
>
>   ------------------------------
> *From:* Avrilia Floratou <flora...@cs.wisc.edu>
> *To:* user@hive.apache.org
> *Sent:* Thursday, January 19, 2012 9:23 PM
> *Subject:* Question on bucketed map join
>
> Hi,
>
> I have two tables with 8 buckets each on the same key and want to join
> them.
> I ran "explain extended" and get the plan produced by HIVE which shows
> that a map-side join is a possible plan.
>
> I then set in my script the hive.optimize.bucketmapjoin option to true and
> reran the "explain extended" query. I get the exact same plans as output.
>
> I ran the query with and without the bucketmapjoin optimization and saw no
> difference in the running time.
>
> I have the following questions:
>
> 1. Should I see sth different in the explain extended output if I set and
> unset the hive.optimize.bucketmapjoin option?
>
> 2. Should I see something different in the output of hive while running
> the query if again I set and unset the hive.optimize.bucketmapjoin?
>
> 3. Is it possible that even though I set bucketmapjoin to true, Hive will
> still perform a normal map-side join for some reason? How can I check if
> this has actually happened?
>
> Thanks,
> Avrilia
>
>

Reply via email to