Joe McDonnell created IMPALA-13181: -------------------------------------- Summary: Disable tuple caching for locations that have a limit Key: IMPALA-13181 URL: https://issues.apache.org/jira/browse/IMPALA-13181 Project: IMPALA Issue Type: Bug Components: Frontend Affects Versions: Impala 4.5.0 Reporter: Joe McDonnell
Statements that use a limit are non-deterministic unless there is a sort. Locations with limits should be marked ineligible for tuple caching. As an example, for a hash join, suppose the build side has a limit. This means that the build side could vary from run to run. A requirement for our correctness is that all nodes agree on the contents of the build side. The variability of the limit is a problem for the build side, because if one node hits the cache and another does not, there is no guarantee that they agree on the contents of the build side. Concrete example: {noformat} select a.l_orderkey from (select l_orderkey from tpch_parquet.lineitem limit 10) a, tpch_parquet.orders b where a.l_orderkey = b.o_orderkey;{noformat} There are times when limits are deterministic or the non-determinism is harmless. It is safer to ban in completely at first. In a future change, this rule can be relaxed to allow caching in those cases. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org