kasakrisz commented on pull request #1324:
URL: https://github.com/apache/hive/pull/1324#issuecomment-666155487


   @HunterL
   Thanks for reviewing this patch.
   The expression
   ```
   filterExpr: (((s_floor_space > 1000) and s_store_sk is not null) or 
s_store_sk is not null)
   ```
   is a result of merging two `TableScanOperator`s. Both of them are scanning 
the same table: `alias: s` but they had different `filterExpr`.
   TS1: ((s_floor_space > 1000) and s_store_sk is not null)
   TS2: s_store_sk is not null
   
   SharedWorkOptimizer naively combined the filter expressions using `or` 
because we need the union of the records produced by both TS. You are right in 
this particular case the filter expression could be reduced to `s_store_sk is 
not null`
   
   The new TS has two children
   ```
   TableScan
     alias: s
     filterExpr: (((s_floor_space > 1000) and s_store_sk is not null) or 
s_store_sk is not null) (type: boolean)
     Filter Operator
       predicate: ((s_floor_space > 1000) and s_store_sk is not null) (type: 
boolean)
       ...
     Filter Operator
       predicate: s_store_sk is not null (type: boolean)
       ...
   ```
   both of them are `Filter operators` which are the root of subtrees to 
broadcast the proper subset of records to each reducer edge (Reducer 2 and 
Reducer 3)
   
   If `and` were used for combining the filter expressions of TS operators the 
branch which does not have the filter `s_floor_space > 1000` would loose a 
subset of records.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org

Reply via email to