Questions Regarding Bucket Map Join in Hive

smart li Sun, 25 Jun 2023 04:43:40 -0700

Hello Hive Users,

I’m currently trying to understand how Bucket Map Join works in Hive, but
I’m encountering some issues that I need help with. Here’s what I did:


Firstly, I created a Hive table using the following statement:

create table map_join_tb(
id int
)
clustered by (id) into 32 buckets;

Then, I inserted 8 million rows of data into the table, with the ‘id’ field
ranging from 1 to 8 million. After the data was bucketed, each bucket was
approximately 2MB in size.

I then set the following bucket map join configurations:

set hive.optimize.bucketmapjoin=true;
set hive.enforce.bucketmapjoin=true;

Lastly, I ran an EXPLAIN on the following SQL:

explain select * from map_join_tb a join map_join_tb b on a.id=b.id;

Surprisingly, it seems that a Reduce Join, not a Bucket Map Join, was being
performed according to the plan. I’m not sure why this is happening. Under
what conditions does Hive decide to perform a Bucket Map Join?

For reference, I am using Hive version 3.1.2. Any help or insights into
this would be greatly appreciated. Thank you very much in advance.

Best Regards,
[smartli]

Questions Regarding Bucket Map Join in Hive

Reply via email to