Re: What is the difference in explain output between Bucket Map Join and normal Map Join?

Hao Zhu Wed, 19 Nov 2014 14:48:46 -0800

Anyone knows the answer?

Thanks.


On Tue, Nov 18, 2014 at 2:20 PM, Hao Zhu <h...@maprtech.com> wrote:

> Hi All,
>
> I am testing the different explain plan output and found it hard to find
> the difference between Bucket Map Join(non sort merged) and normal Map Join.
>
> The tested hive version is 0.13.
> To enable Bucket Map Join, there are 2 conditions:
> 1. set hive.optimize.bucketmapjoin = true;
> 2. If the tables being joined are bucketized on the join columns, and the
> number of buckets in one table is a multiple of the number of buckets in
> the other table, the buckets can be joined with each other.
>
> Here are my tests:
> *1. Create 2 Bucketed Tables(non-sorted.)*
> create table b1(col0 string,col1 string,col2 string,col3 string,col4
> string,col5 string,col6 string)
> clustered by (col0) into 32 buckets;
> create table b2(col0 string,col1 string,col2 string,col3 string,col4
> string,col5 string,col6 string)
> clustered by (col0) into 8 buckets;
> set hive.enforce.bucketing = true;
> From passwords insert OVERWRITE  table b1 select * limit 10000;
> From passwords insert OVERWRITE  table b2 select * limit 10000;
>
> *2. set hive.optimize.bucketmapjoin = true;*
>
>> hive> explain  select  b1.* from b1,b2 where b1.col0=b2.col0;
>> OK
>> STAGE DEPENDENCIES:
>>   Stage-4 is a root stage
>>   Stage-3 depends on stages: Stage-4
>>   Stage-0 is a root stage
>> STAGE PLANS:
>>   Stage: Stage-4
>>     Map Reduce Local Work
>>       Alias -> Map Local Tables:
>>         b1
>>           Fetch Operator
>>             limit: -1
>>       Alias -> Map Local Operator Tree:
>>         b1
>>           TableScan
>>             alias: b1
>>             Statistics: Num rows: 10000 Data size: 478869 Basic stats:
>> COMPLETE Column stats: NONE
>>             HashTable Sink Operator
>>               condition expressions:
>>                 0 {col0} {col1} {col2} {col3} {col4} {col5} {col6}
>>                 1 {col0}
>>               keys:
>>                 0 col0 (type: string)
>>                 1 col0 (type: string)
>>   Stage: Stage-3
>>     Map Reduce
>>       Map Operator Tree:
>>           TableScan
>>             alias: b2
>>             Statistics: Num rows: 10000 Data size: 478869 Basic stats:
>> COMPLETE Column stats: NONE
>>             Map Join Operator
>>               condition map:
>>                    Inner Join 0 to 1
>>               condition expressions:
>>                 0 {col0} {col1} {col2} {col3} {col4} {col5} {col6}
>>                 1 {col0}
>>               keys:
>>                 0 col0 (type: string)
>>                 1 col0 (type: string)
>>               outputColumnNames: _col0, _col1, _col2, _col3, _col4,
>> _col5, _col6, _col9
>>               Statistics: Num rows: 11000 Data size: 526755 Basic stats:
>> COMPLETE Column stats: NONE
>>               Filter Operator
>>                 predicate: (_col0 = _col9) (type: boolean)
>>                 Statistics: Num rows: 5500 Data size: 263377 Basic stats:
>> COMPLETE Column stats: NONE
>>                 Select Operator
>>                   expressions: _col0 (type: string), _col1 (type:
>> string), _col2 (type: string), _col3 (type: string), _col4 (type: string),
>> _col5 (type: string), _col6 (type: string)
>>                   outputColumnNames: _col0, _col1, _col2, _col3, _col4,
>> _col5, _col6
>>                   Statistics: Num rows: 5500 Data size: 263377 Basic
>> stats: COMPLETE Column stats: NONE
>>                   File Output Operator
>>                     compressed: false
>>                     Statistics: Num rows: 5500 Data size: 263377 Basic
>> stats: COMPLETE Column stats: NONE
>>                     table:
>>                         input format:
>> org.apache.hadoop.mapred.TextInputFormat
>>                         output format:
>> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>>                         serde:
>> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>>       Local Work:
>>         Map Reduce Local Work
>>   Stage: Stage-0
>>     Fetch Operator
>>       limit: -1
>> Time taken: 0.171 seconds, Fetched: 63 row(s)
>
> So we only see normal Map Join operator.
>
> *My question is :*
>
>
>    - What operator will Bucket map join show in explain output?
>    - The same as normal Map Join?If so, how do we know if the SQL is
>    using bucket map join or normal map join?
>
> Thanks.
>
>
> Thanks,
>
> Hao
>

Re: What is the difference in explain output between Bucket Map Join and normal Map Join?

Reply via email to