Anyone knows the answer? Thanks.
On Tue, Nov 18, 2014 at 2:20 PM, Hao Zhu <h...@maprtech.com> wrote: > Hi All, > > I am testing the different explain plan output and found it hard to find > the difference between Bucket Map Join(non sort merged) and normal Map Join. > > The tested hive version is 0.13. > To enable Bucket Map Join, there are 2 conditions: > 1. set hive.optimize.bucketmapjoin = true; > 2. If the tables being joined are bucketized on the join columns, and the > number of buckets in one table is a multiple of the number of buckets in > the other table, the buckets can be joined with each other. > > Here are my tests: > *1. Create 2 Bucketed Tables(non-sorted.)* > create table b1(col0 string,col1 string,col2 string,col3 string,col4 > string,col5 string,col6 string) > clustered by (col0) into 32 buckets; > create table b2(col0 string,col1 string,col2 string,col3 string,col4 > string,col5 string,col6 string) > clustered by (col0) into 8 buckets; > set hive.enforce.bucketing = true; > From passwords insert OVERWRITE table b1 select * limit 10000; > From passwords insert OVERWRITE table b2 select * limit 10000; > > *2. set hive.optimize.bucketmapjoin = true;* > >> hive> explain select b1.* from b1,b2 where b1.col0=b2.col0; >> OK >> STAGE DEPENDENCIES: >> Stage-4 is a root stage >> Stage-3 depends on stages: Stage-4 >> Stage-0 is a root stage >> STAGE PLANS: >> Stage: Stage-4 >> Map Reduce Local Work >> Alias -> Map Local Tables: >> b1 >> Fetch Operator >> limit: -1 >> Alias -> Map Local Operator Tree: >> b1 >> TableScan >> alias: b1 >> Statistics: Num rows: 10000 Data size: 478869 Basic stats: >> COMPLETE Column stats: NONE >> HashTable Sink Operator >> condition expressions: >> 0 {col0} {col1} {col2} {col3} {col4} {col5} {col6} >> 1 {col0} >> keys: >> 0 col0 (type: string) >> 1 col0 (type: string) >> Stage: Stage-3 >> Map Reduce >> Map Operator Tree: >> TableScan >> alias: b2 >> Statistics: Num rows: 10000 Data size: 478869 Basic stats: >> COMPLETE Column stats: NONE >> Map Join Operator >> condition map: >> Inner Join 0 to 1 >> condition expressions: >> 0 {col0} {col1} {col2} {col3} {col4} {col5} {col6} >> 1 {col0} >> keys: >> 0 col0 (type: string) >> 1 col0 (type: string) >> outputColumnNames: _col0, _col1, _col2, _col3, _col4, >> _col5, _col6, _col9 >> Statistics: Num rows: 11000 Data size: 526755 Basic stats: >> COMPLETE Column stats: NONE >> Filter Operator >> predicate: (_col0 = _col9) (type: boolean) >> Statistics: Num rows: 5500 Data size: 263377 Basic stats: >> COMPLETE Column stats: NONE >> Select Operator >> expressions: _col0 (type: string), _col1 (type: >> string), _col2 (type: string), _col3 (type: string), _col4 (type: string), >> _col5 (type: string), _col6 (type: string) >> outputColumnNames: _col0, _col1, _col2, _col3, _col4, >> _col5, _col6 >> Statistics: Num rows: 5500 Data size: 263377 Basic >> stats: COMPLETE Column stats: NONE >> File Output Operator >> compressed: false >> Statistics: Num rows: 5500 Data size: 263377 Basic >> stats: COMPLETE Column stats: NONE >> table: >> input format: >> org.apache.hadoop.mapred.TextInputFormat >> output format: >> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat >> serde: >> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe >> Local Work: >> Map Reduce Local Work >> Stage: Stage-0 >> Fetch Operator >> limit: -1 >> Time taken: 0.171 seconds, Fetched: 63 row(s) > > So we only see normal Map Join operator. > > *My question is :* > > > - What operator will Bucket map join show in explain output? > - The same as normal Map Join?If so, how do we know if the SQL is > using bucket map join or normal map join? > > Thanks. > > > Thanks, > > Hao >