Hello Hive Users, I’m currently trying to understand how Bucket Map Join works in Hive, but I’m encountering some issues that I need help with. Here’s what I did:
Firstly, I created a Hive table using the following statement: create table map_join_tb( id int ) clustered by (id) into 32 buckets; Then, I inserted 8 million rows of data into the table, with the ‘id’ field ranging from 1 to 8 million. After the data was bucketed, each bucket was approximately 2MB in size. I then set the following bucket map join configurations: set hive.optimize.bucketmapjoin=true; set hive.enforce.bucketmapjoin=true; Lastly, I ran an EXPLAIN on the following SQL: explain select * from map_join_tb a join map_join_tb b on a.id=b.id; Surprisingly, it seems that a Reduce Join, not a Bucket Map Join, was being performed according to the plan. I’m not sure why this is happening. Under what conditions does Hive decide to perform a Bucket Map Join? For reference, I am using Hive version 3.1.2. Any help or insights into this would be greatly appreciated. Thank you very much in advance. Best Regards, [smartli]