Hi Bejoy, I am joining two tables which are both bucketed 64 ways and i want to do a bucketed map join on them. I set the flag set hive.optimize.bucketmapjoin = true;. The auto.convert.join is always false on our cluster. When i run the following query:
select /*+ MAPJOIN(b) */ a.visitor_id FROM amit_merchinteraction a join amit_dse_test_cell_allocation_f b ON a.visitor_id == b.account_id where a.country_id = 1 and a.dateint <= 20120322 and a.dateint >= 20120315 ; Hive sequentially creates a hash map using the contents of the mapjoin table b, on the client , one at a time. Is that expeced behaviour? Should it not create these hash maps on the corresponding mappers in parallel? Thanks, Amit On Thu, Jan 19, 2012 at 9:22 AM, Bejoy Ks <bejoy...@yahoo.com> wrote: > Hi Avrila > AFAIK the bucketed map join is not default in hive and it happens > only when the values is set to true. It could be because the same value is > already set in the hive configuration xml file. To cross confirm the same > could you explicitly set this to false > (set hive.optimize.bucketmapjoin = false;)and get the query execution > plan from explain command. > > Please some pointers in line > > 1. Should I see sth different in the explain extended output if I set and > unset the hive.optimize.bucketmapjoin option? > [Bejoy] you should be seeing the same > Try EXPLAIN your join query after setting this > set hive.optimize.bucketmapjoin = false; > > 2. Should I see something different in the output of hive while running > the query if again I set and unset the hive.optimize.bucketmapjoin? > [Bejoy] No,Hive output should be the same. What ever is the execution plan > for an join, optimally the end result should be same. > > 3. Is it possible that even though I set bucketmapjoin to true, Hive will > still perform a normal map-side join for some reason? How can I check if > this has actually happened? > [Bejoy] Hive would perform a plain map side join only if the following > parameter is enabled. (default it is disabled) > set hive.auto.convert.join = true; you need to check this value in your > configurations. > If it is enabled irrespective of the table size hive would always try a > map join, it would come to a normal join only after the map join attempt > fails. > AFAIK, if the number of buckets are same or multiples between the two > tables involved in a join and if the join is on the same columns that are > bucketed, with bucketmapjoin enabled it shouldn't execute a plain mapside > join a bucketed map side join would be triggered. > > Hope it helps!.. > > Regards > Bejoy.K.S > > ------------------------------ > *From:* Avrilia Floratou <flora...@cs.wisc.edu> > *To:* user@hive.apache.org > *Sent:* Thursday, January 19, 2012 9:23 PM > *Subject:* Question on bucketed map join > > Hi, > > I have two tables with 8 buckets each on the same key and want to join > them. > I ran "explain extended" and get the plan produced by HIVE which shows > that a map-side join is a possible plan. > > I then set in my script the hive.optimize.bucketmapjoin option to true and > reran the "explain extended" query. I get the exact same plans as output. > > I ran the query with and without the bucketmapjoin optimization and saw no > difference in the running time. > > I have the following questions: > > 1. Should I see sth different in the explain extended output if I set and > unset the hive.optimize.bucketmapjoin option? > > 2. Should I see something different in the output of hive while running > the query if again I set and unset the hive.optimize.bucketmapjoin? > > 3. Is it possible that even though I set bucketmapjoin to true, Hive will > still perform a normal map-side join for some reason? How can I check if > this has actually happened? > > Thanks, > Avrilia > >