[ https://issues.apache.org/jira/browse/HIVE-9068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Matt McCline resolved HIVE-9068. -------------------------------- Resolution: Incomplete > Hive : With CBO disabled Vectorization in Map join disabled causing 100% > increase in elapsed time and CPU (possibly due to redundant filter operator) > ----------------------------------------------------------------------------------------------------------------------------------------------------- > > Key: HIVE-9068 > URL: https://issues.apache.org/jira/browse/HIVE-9068 > Project: Hive > Issue Type: Bug > Components: Vectorization > Affects Versions: 0.14.0 > Reporter: Mostafa Mokhtar > Assignee: Matt McCline > Priority: Major > Fix For: 0.14.1 > > > With CBO off there is a redundant filter operator > {code} > Filter Operator > predicate: ((null is null and (_col22 = _col51)) > and (_col1 = _col26)) (type: boolean) > {code} > Possibly this is why Vectorization is getting disabled with CBO off, this > operator doesn't exist with CBO on. > Query > {code} > select > count(*) > from > (SELECT > 'store' as channel, > 'ss_addr_sk' col_name, > d_year, > d_qoy, > i_category, > ss_ext_sales_price ext_sales_price > FROM > store_sales, item, date_dim > WHERE > ss_addr_sk IS NULL > AND store_sales.ss_sold_date_sk = date_dim.d_date_sk > AND store_sales.ss_item_sk = item.i_item_sk) a; > {code} > Explain with CBO OFF > {code} > STAGE DEPENDENCIES: > Stage-1 is a root stage > Stage-0 depends on stages: Stage-1 > STAGE PLANS: > Stage: Stage-1 > Tez > Edges: > Map 1 <- Map 3 (BROADCAST_EDGE), Map 4 (BROADCAST_EDGE) > Reducer 2 <- Map 1 (SIMPLE_EDGE) > DagName: mmokhtar_20141210171212_02c36f60-ceea-4e18-a266-5baecfd023f2:6 > Vertices: > Map 1 > Map Operator Tree: > TableScan > alias: store_sales > filterExpr: (ss_item_sk is not null and ss_addr_sk is null) > (type: boolean) > Statistics: Num rows: 82510879939 Data size: 6873789738208 > Basic stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: (ss_item_sk is not null and ss_addr_sk is > null) (type: boolean) > Statistics: Num rows: 1946839900 Data size: 23178336456 > Basic stats: COMPLETE Column stats: COMPLETE > Map Join Operator > condition map: > Inner Join 0 to 1 > condition expressions: > 0 {ss_item_sk} {ss_sold_date_sk} > 1 {i_item_sk} > keys: > 0 ss_item_sk (type: int) > 1 i_item_sk (type: int) > outputColumnNames: _col1, _col22, _col26 > input vertices: > 1 Map 4 > Statistics: Num rows: 1946839936 Data size: 23362079232 > Basic stats: COMPLETE Column stats: COMPLETE > Map Join Operator > condition map: > Inner Join 0 to 1 > condition expressions: > 0 {_col1} {_col22} {_col26} > 1 {d_date_sk} > keys: > 0 _col22 (type: int) > 1 d_date_sk (type: int) > outputColumnNames: _col1, _col22, _col26, _col51 > input vertices: > 1 Map 3 > Statistics: Num rows: 2176800197 Data size: > 34828803152 Basic stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: ((null is null and (_col22 = _col51)) > and (_col1 = _col26)) (type: boolean) > Statistics: Num rows: 272100024 Data size: > 4353600384 Basic stats: COMPLETE Column stats: COMPLETE > Select Operator > Statistics: Num rows: 272100024 Data size: > 4353600384 Basic stats: COMPLETE Column stats: COMPLETE > Group By Operator > aggregations: count() > mode: hash > outputColumnNames: _col0 > Statistics: Num rows: 1 Data size: 8 Basic > stats: COMPLETE Column stats: COMPLETE > Reduce Output Operator > sort order: > Statistics: Num rows: 1 Data size: 8 Basic > stats: COMPLETE Column stats: COMPLETE > value expressions: _col0 (type: bigint) > Map 3 > Map Operator Tree: > TableScan > alias: date_dim > filterExpr: d_date_sk is not null (type: boolean) > Statistics: Num rows: 73049 Data size: 81741831 Basic > stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: d_date_sk is not null (type: boolean) > Statistics: Num rows: 73049 Data size: 292196 Basic > stats: COMPLETE Column stats: COMPLETE > Reduce Output Operator > key expressions: d_date_sk (type: int) > sort order: + > Map-reduce partition columns: d_date_sk (type: int) > Statistics: Num rows: 73049 Data size: 292196 Basic > stats: COMPLETE Column stats: COMPLETE > Select Operator > expressions: d_date_sk (type: int) > outputColumnNames: _col0 > Statistics: Num rows: 73049 Data size: 292196 Basic > stats: COMPLETE Column stats: COMPLETE > Group By Operator > keys: _col0 (type: int) > mode: hash > outputColumnNames: _col0 > Statistics: Num rows: 36524 Data size: 146096 Basic > stats: COMPLETE Column stats: COMPLETE > Dynamic Partitioning Event Operator > Target Input: store_sales > Partition key expr: ss_sold_date_sk > Statistics: Num rows: 36524 Data size: 146096 Basic > stats: COMPLETE Column stats: COMPLETE > Target column: ss_sold_date_sk > Target Vertex: Map 1 > Execution mode: vectorized > Map 4 > Map Operator Tree: > TableScan > alias: item > filterExpr: i_item_sk is not null (type: boolean) > Statistics: Num rows: 462000 Data size: 663862160 Basic > stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: i_item_sk is not null (type: boolean) > Statistics: Num rows: 462000 Data size: 1848000 Basic > stats: COMPLETE Column stats: COMPLETE > Reduce Output Operator > key expressions: i_item_sk (type: int) > sort order: + > Map-reduce partition columns: i_item_sk (type: int) > Statistics: Num rows: 462000 Data size: 1848000 Basic > stats: COMPLETE Column stats: COMPLETE > Execution mode: vectorized > Reducer 2 > Reduce Operator Tree: > Group By Operator > aggregations: count(VALUE._col0) > mode: mergepartial > outputColumnNames: _col0 > Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE > Column stats: COMPLETE > Select Operator > expressions: _col0 (type: bigint) > outputColumnNames: _col0 > Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE > Column stats: COMPLETE > File Output Operator > compressed: false > Statistics: Num rows: 1 Data size: 8 Basic stats: > COMPLETE Column stats: COMPLETE > table: > input format: org.apache.hadoop.mapred.TextInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > serde: > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > Stage: Stage-0 > Fetch Operator > limit: -1 > Processor Tree: > ListSink > {code} > Explain with CBO on > {code} > STAGE PLANS: > Stage: Stage-1 > Tez > Edges: > Map 1 <- Map 3 (BROADCAST_EDGE), Map 4 (BROADCAST_EDGE) > Reducer 2 <- Map 1 (SIMPLE_EDGE) > DagName: mmokhtar_20141210171212_495d0eb9-d176-43d3-8101-84821a0c0fdf:5 > Vertices: > Map 1 > Map Operator Tree: > TableScan > alias: store_sales > filterExpr: (ss_addr_sk is null and ss_item_sk is not null) > (type: boolean) > Statistics: Num rows: 82510879939 Data size: 6873789738208 > Basic stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: (ss_addr_sk is null and ss_item_sk is not > null) (type: boolean) > Statistics: Num rows: 1946839900 Data size: 23178336456 > Basic stats: COMPLETE Column stats: COMPLETE > Select Operator > expressions: ss_item_sk (type: int), ss_sold_date_sk > (type: int) > outputColumnNames: _col0, _col2 > Statistics: Num rows: 1946839900 Data size: 15574719200 > Basic stats: COMPLETE Column stats: COMPLETE > Map Join Operator > condition map: > Inner Join 0 to 1 > condition expressions: > 0 > 1 {_col2} > keys: > 0 _col0 (type: int) > 1 _col0 (type: int) > outputColumnNames: _col3 > input vertices: > 0 Map 4 > Statistics: Num rows: 1946839936 Data size: > 7787359744 Basic stats: COMPLETE Column stats: COMPLETE > Select Operator > expressions: _col3 (type: int) > outputColumnNames: _col3 > Statistics: Num rows: 1946839936 Data size: > 7787359744 Basic stats: COMPLETE Column stats: COMPLETE > Map Join Operator > condition map: > Inner Join 0 to 1 > condition expressions: > 0 > 1 > keys: > 0 _col0 (type: int) > 1 _col3 (type: int) > input vertices: > 0 Map 3 > Statistics: Num rows: 3232152511019 Data size: 0 > Basic stats: PARTIAL Column stats: COMPLETE > Select Operator > Statistics: Num rows: 3232152511019 Data size: > 0 Basic stats: PARTIAL Column stats: COMPLETE > Group By Operator > aggregations: count() > mode: hash > outputColumnNames: _col0 > Statistics: Num rows: 1 Data size: 8 Basic > stats: COMPLETE Column stats: COMPLETE > Reduce Output Operator > sort order: > Statistics: Num rows: 1 Data size: 8 Basic > stats: COMPLETE Column stats: COMPLETE > value expressions: _col0 (type: bigint) > Execution mode: vectorized > Map 3 > Map Operator Tree: > TableScan > alias: date_dim > filterExpr: d_date_sk is not null (type: boolean) > Statistics: Num rows: 73049 Data size: 81741831 Basic > stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: d_date_sk is not null (type: boolean) > Statistics: Num rows: 73049 Data size: 292196 Basic > stats: COMPLETE Column stats: COMPLETE > Select Operator > expressions: d_date_sk (type: int) > outputColumnNames: _col0 > Statistics: Num rows: 73049 Data size: 292196 Basic > stats: COMPLETE Column stats: COMPLETE > Reduce Output Operator > key expressions: _col0 (type: int) > sort order: + > Map-reduce partition columns: _col0 (type: int) > Statistics: Num rows: 73049 Data size: 292196 Basic > stats: COMPLETE Column stats: COMPLETE > Select Operator > expressions: _col0 (type: int) > outputColumnNames: _col0 > Statistics: Num rows: 73049 Data size: 292196 Basic > stats: COMPLETE Column stats: COMPLETE > Group By Operator > keys: _col0 (type: int) > mode: hash > outputColumnNames: _col0 > Statistics: Num rows: 36524 Data size: 146096 Basic > stats: COMPLETE Column stats: COMPLETE > Dynamic Partitioning Event Operator > Target Input: store_sales > Partition key expr: ss_sold_date_sk > Statistics: Num rows: 36524 Data size: 146096 > Basic stats: COMPLETE Column stats: COMPLETE > Target column: ss_sold_date_sk > Target Vertex: Map 1 > Execution mode: vectorized > Map 4 > Map Operator Tree: > TableScan > alias: item > filterExpr: i_item_sk is not null (type: boolean) > Statistics: Num rows: 462000 Data size: 663862160 Basic > stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: i_item_sk is not null (type: boolean) > Statistics: Num rows: 462000 Data size: 1848000 Basic > stats: COMPLETE Column stats: COMPLETE > Select Operator > expressions: i_item_sk (type: int) > outputColumnNames: _col0 > Statistics: Num rows: 462000 Data size: 1848000 Basic > stats: COMPLETE Column stats: COMPLETE > Reduce Output Operator > key expressions: _col0 (type: int) > sort order: + > Map-reduce partition columns: _col0 (type: int) > Statistics: Num rows: 462000 Data size: 1848000 Basic > stats: COMPLETE Column stats: COMPLETE > Execution mode: vectorized > Reducer 2 > Reduce Operator Tree: > Group By Operator > aggregations: count(VALUE._col0) > mode: mergepartial > outputColumnNames: _col0 > Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE > Column stats: COMPLETE > Select Operator > expressions: _col0 (type: bigint) > outputColumnNames: _col0 > Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE > Column stats: COMPLETE > File Output Operator > compressed: false > Statistics: Num rows: 1 Data size: 8 Basic stats: > COMPLETE Column stats: COMPLETE > table: > input format: org.apache.hadoop.mapred.TextInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > serde: > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > Stage: Stage-0 > Fetch Operator > limit: -1 > Processor Tree: > ListSink > Time taken: 3.874 seconds, Fetched: 144 row(s) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)