Hello Team,

I am trying to join a big table with around 30 columns with another table with 
just one column.

Table A (part of the ddl):

CLUSTERED BY (
  party_id)
SORTED BY (
  party_id ASC,
  acct_id ASC)
INTO 64 BUCKETS

Table B(part of the ddl):

CLUSTERED BY (
  party_id)
SORTED BY (
  party_id ASC)
INTO 64 BUCKETS

Table A has : 3 billion records.
Table B has: 353 million records.

Since both of them are clustered by same key, can we achieve a SMB map join, 
where there is no need of reducers? When I try running this join, I could see 
there are reducers in my job. So Can somebody help what is the purpose of 
reducers here?

I have used the following settings to force SMB map-join.

set 
hive.auto.convert.sortmerge.join.bigtable.selection.policy=org.apache.hadoop.hive.ql.optimizer.TableSizeBasedBigTableSelectorForAutoSMJ;
set hive.auto.convert.sortmerge.join=true;
set hive.optimize.bucketmapjoin = true;
set hive.optimize.bucketmapjoin.sortedmerge = true;
set hive.auto.convert.sortmerge.join.noconditionaltask=true;
set hive.auto.convert.join.use.nonstaged=true;


Thanks,
Murali

Reply via email to