As you have observed, today, multiple references to a relation results in multiple scans of the relation -- so that is the expected behavior.
Most system that materialize CTEs with >1 reference to them do so via temporary tables behind the scenes. That said, such functionality does not exist in Impala today, but there is a JIRA for it: https://issues.apache.org/jira/browse/IMPALA-8083 On Thu, Jun 27, 2019 at 9:50 AM Boris Tyukin <[email protected]> wrote: > Hello, > > a lot of our SQL users love CTEs (Common Table Expressions), including > myself. CTEs make code easier to read, maintain and normally have very good > performance in traditional RDBMS like SQL Server or Oracle (and our SQL > users happen to have a background with those systems). > > While CTEs work fine in Impala, it seems to me that Impala does not > materialize CTEs temporarily, if CTE is used in more than one place in a > query, and instead get expanded and all the joins repeated in every place > CTE results are used. > > Is this expected behavior and is there any good hints (not hacks) to force > Impala to materialize CTEs if they are used in several places in the same > query? > > I did not find anything concerning CTE performance in Impala docs and only > found this thread which does look to me like a hack than a solution > > https://community.cloudera.com/t5/Interactive-Short-cycle-SQL/Performance-Reduced-after-Removing-ORDER-BY-clause/td-p/28832 > > > Thanks, > Boris >
