Hello Team, I am trying to join a big table with around 30 columns with another table with just one column.
Table A (part of the ddl): CLUSTERED BY ( party_id) SORTED BY ( party_id ASC, acct_id ASC) INTO 64 BUCKETS Table B(part of the ddl): CLUSTERED BY ( party_id) SORTED BY ( party_id ASC) INTO 64 BUCKETS Table A has : 3 billion records. Table B has: 353 million records. Since both of them are clustered by same key, can we achieve a SMB map join, where there is no need of reducers? When I try running this join, I could see there are reducers in my job. So Can somebody help what is the purpose of reducers here? I have used the following settings to force SMB map-join. set hive.auto.convert.sortmerge.join.bigtable.selection.policy=org.apache.hadoop.hive.ql.optimizer.TableSizeBasedBigTableSelectorForAutoSMJ; set hive.auto.convert.sortmerge.join=true; set hive.optimize.bucketmapjoin = true; set hive.optimize.bucketmapjoin.sortedmerge = true; set hive.auto.convert.sortmerge.join.noconditionaltask=true; set hive.auto.convert.join.use.nonstaged=true; Thanks, Murali