[ 
https://issues.apache.org/jira/browse/CALCITE-481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Mior updated CALCITE-481:
---------------------------------
    Description: 
If a sub-tree occurs more than once in a query an efficient plan would probably 
evaluate once and have two readers read the same data. We propose a "Spool" 
relational expression for this purpose.

Spool would have one input, the expression that populates it.

In the VolcanoPlanner, any RelNode can already have multiple consumers (each of 
which sees the same row type and the same data) but an optimal plan does not 
typically include multiple uses of the same node, so most implementors (e.g. 
EnumerableRelImplementor) would just not notice, and generate the same code 
twice. Having an explicit Spool would alert the implementor to re-use the 
result.

We do not prescribe a mechanism for implementing Spool as a physical operator. 
A job that populates a temporary table is one possible mechanism.

As part of this case, we should implement Spool in Enumerable convention, and 
use it to evaluate some test queries.

The other reason to implement Spool is costing. The cost of a Spool with N 
consumers is typically something like A + B . N. A, the fixed cost, is 
significantly larger than B, the re-play cost.

Volcano's dynamic programming model does not make it easy to account for 
re-use. There are approaches in academia based on integer linear programming; 
see e.g. http://www.slideshare.net/INRIA-OAK/plreuse and 
https://hal.inria.fr/hal-01353891/document.

  was:
If a sub-tree occurs more than once in a query an efficient plan would probably 
evaluate once and have two readers read the same data. We propose a "Spool" 
relational expression for this purpose.

Spool would have one input, the expression that populates it.

In the VolcanoPlanner, any RelNode can already have multiple consumers (each of 
which sees the same row type and the same data) but an optimal plan does not 
typically include multiple uses of the same node, so most implementors (e.g. 
EnumerableRelImplementor) would just not notice, and generate the same code 
twice. Having an explicit Spool would alert the implementor to re-use the 
result.

We do not prescribe a mechanism for implementing Spool as a physical operator. 
A job that populates a temporary table is one possible mechanism.

As part of this case, we should implement Spool in Enumerable convention, and 
use it to evaluate some test queries.

The other reason to implement Spool is costing. The cost of a Spool with N 
consumers is typically something like A + B . N. A, the fixed cost, is 
significantly larger than B, the re-play cost.

Volcano's dynamic programming model does not make it easy to account for 
re-use. There are approaches in academia based on integer linear programming; 
see e.g. http://www.slideshare.net/INRIA-OAK/plreuse 


> Add "Spool" operator, to allow re-use of relational expressions
> ---------------------------------------------------------------
>
>                 Key: CALCITE-481
>                 URL: https://issues.apache.org/jira/browse/CALCITE-481
>             Project: Calcite
>          Issue Type: Bug
>            Reporter: Julian Hyde
>            Assignee: Jesus Camacho Rodriguez
>            Priority: Major
>
> If a sub-tree occurs more than once in a query an efficient plan would 
> probably evaluate once and have two readers read the same data. We propose a 
> "Spool" relational expression for this purpose.
> Spool would have one input, the expression that populates it.
> In the VolcanoPlanner, any RelNode can already have multiple consumers (each 
> of which sees the same row type and the same data) but an optimal plan does 
> not typically include multiple uses of the same node, so most implementors 
> (e.g. EnumerableRelImplementor) would just not notice, and generate the same 
> code twice. Having an explicit Spool would alert the implementor to re-use 
> the result.
> We do not prescribe a mechanism for implementing Spool as a physical 
> operator. A job that populates a temporary table is one possible mechanism.
> As part of this case, we should implement Spool in Enumerable convention, and 
> use it to evaluate some test queries.
> The other reason to implement Spool is costing. The cost of a Spool with N 
> consumers is typically something like A + B . N. A, the fixed cost, is 
> significantly larger than B, the re-play cost.
> Volcano's dynamic programming model does not make it easy to account for 
> re-use. There are approaches in academia based on integer linear programming; 
> see e.g. http://www.slideshare.net/INRIA-OAK/plreuse and 
> https://hal.inria.fr/hal-01353891/document.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to