As you have observed, today, multiple references to a relation results in
multiple scans of the relation -- so that is the expected behavior.

Most system that materialize CTEs with >1 reference to them do so via
temporary tables behind the scenes.  That said, such functionality does not
exist in Impala today, but there is a JIRA for it:
https://issues.apache.org/jira/browse/IMPALA-8083



On Thu, Jun 27, 2019 at 9:50 AM Boris Tyukin <[email protected]> wrote:

> Hello,
>
> a lot of our SQL users love CTEs (Common Table Expressions), including
> myself. CTEs make code easier to read, maintain and normally have very good
> performance in traditional RDBMS like SQL Server or Oracle (and our SQL
> users happen to have a background with those systems).
>
> While CTEs work fine in Impala, it seems to me that Impala does not
> materialize CTEs temporarily, if CTE is used in more than one place in a
> query, and instead get expanded and all the joins repeated in every place
> CTE results are used.
>
> Is this expected behavior and is there any good hints (not hacks) to force
> Impala to materialize CTEs if they are used in several places in the same
> query?
>
> I did not find anything concerning CTE performance in Impala docs and only
> found this thread which does look to me like a hack than a solution
>
> https://community.cloudera.com/t5/Interactive-Short-cycle-SQL/Performance-Reduced-after-Removing-ORDER-BY-clause/td-p/28832
>
>
> Thanks,
> Boris
>

Reply via email to