Joe McDonnell created IMPALA-13181:
--------------------------------------

             Summary: Disable tuple caching for locations that have a limit
                 Key: IMPALA-13181
                 URL: https://issues.apache.org/jira/browse/IMPALA-13181
             Project: IMPALA
          Issue Type: Bug
          Components: Frontend
    Affects Versions: Impala 4.5.0
            Reporter: Joe McDonnell


Statements that use a limit are non-deterministic unless there is a sort. 
Locations with limits should be marked ineligible for tuple caching.

As an example, for a hash join, suppose the build side has a limit. This means 
that the build side could vary from run to run. A requirement for our 
correctness is that all nodes agree on the contents of the build side. The 
variability of the limit is a problem for the build side, because if one node 
hits the cache and another does not, there is no guarantee that they agree on 
the contents of the build side.

Concrete example: 
{noformat}
select a.l_orderkey from (select l_orderkey from tpch_parquet.lineitem limit 
10) a, tpch_parquet.orders b where a.l_orderkey = b.o_orderkey;{noformat}
There are times when limits are deterministic or the non-determinism is 
harmless. It is safer to ban in completely at first. In a future change, this 
rule can be relaxed to allow caching in those cases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to