Simhadri Govindappa created HIVE-26673: ------------------------------------------
Summary: Incorrect row count when vectorisation is enabled Key: HIVE-26673 URL: https://issues.apache.org/jira/browse/HIVE-26673 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 4.0.0-alpha-2 Reporter: Simhadri Govindappa Repro: {noformat} select count(*) from (SELECT T0.plant_no, T0.part_chain, T0.part_new, T0.part_no FROM dm_ads_dims_prod.cloudera_test3 T0 LEFT JOIN (SELECT T0.plant_no, T0.part_chain FROM (SELECT T0.plant_no, T0.part_chain, count( *) AS ct FROM dm_ads_dims_prod.cloudera_test3 T0 WHERE purchase_pos = pos GROUP BY T0.plant_no, T0.part_chain) T0 WHERE ct = 2 ) T1 ON T0.plant_no = T1.plant_no AND T0.part_chain = T1.part_chain WHERE T0.purchase_pos = T0.pos AND (T1.part_chain IS NULL OR (T1.part_chain IS NOT NULL AND T0.fd = 1)) ) s; {noformat} Run the query with the following settings on the repro cluster a few times {code:java} set hive.query.results.cache.enabled=false; set hive.compute.query.using.stats=false; set hive.auto.convert.join=true; {code} and the results was {code:java} 2682424 2682426 2682425{code} Then turn off {{hive.auto.convert.join}} {code:java} set hive.query.results.cache.enabled=false; set hive.compute.query.using.stats=false; set hive.auto.convert.join=false; {code} and the result was always *2682420* Analyzing the plans with hive.auto.convert.join enabled vs disabled, the difference is the type of join Map vs Merge. Additionally, vectorization also plays a role when turned off the result became good: {code:java} SET hive.vectorized.execution.enabled=false; {code} It is also just a workaround and has negative impact on performance this should help us narrow down where to find the cause of the issue. -- This message was sent by Atlassian Jira (v8.20.10#820010)