Re: Streams Query Optimization Using Rate and Window Size

2019-07-15 Thread Julian Hyde
Row-count is not the only statistic you can use in computing your cost function. You can use many statistics in your cost function (e.g. selectivity, column cardinality, predicates). So, if you decide to use rows-per-minute as your cost function, you can compute rows-per-minute of the join

Re: Streams Query Optimization Using Rate and Window Size

2019-07-15 Thread Alireza Samadian
Hi Julian, Thank you for you reply. I think the problem with interpreting rowCount as the rate is we need both rate and window size of the inputs for the estimation of output rate of the RelNodes, and this cannot be embedded into a single number. As an example, let A and B be two windowed

Re: Streams Query Optimization Using Rate and Window Size

2019-07-12 Thread Kenneth Knowles
On Fri, Jul 12, 2019 at 5:43 PM Julian Hyde wrote: > In practice, the rowCount is just a number. So you can think of it as > rows-per-second if you are optimizing a continuous query. > > If you are using a table in a streaming query, does it have a “rows per > second?”. Yes - it is the number of

Re: Streams Query Optimization Using Rate and Window Size

2019-07-12 Thread Julian Hyde
In practice, the rowCount is just a number. So you can think of it as rows-per-second if you are optimizing a continuous query. If you are using a table in a streaming query, does it have a “rows per second?”. Yes - it is the number of rows in the table multiplied by the number of times per

Re: Streams Query Optimization Using Rate and Window Size

2019-07-10 Thread Stamatis Zampetakis
Looking forward for the outcome :) Below a few comments regarding the extensibility concerns of Kenn. In order to find the best plan the VolcanoPlanner just needs to know if one cost is less than another cost [1] and this is encapsulated in the isLe/isLt methods [2]. Adding a new cost class

Re: Streams Query Optimization Using Rate and Window Size

2019-07-10 Thread Alireza Samadian
Dear Stamatis, Thank you for your reply. I will probably go with overriding computeSelfCost() as the first step. I checked it, and it seems to be working. Dear Kenn, The cited paper estimates those two values for each node and passes it up but they are not the cost. The cost of a node depends on

Re: Streams Query Optimization Using Rate and Window Size

2019-07-10 Thread Kenneth Knowles
Following this discussion, I have a question which I think is on topic. Seems like there's two places that from my brief reading are not quite extensible enough. 1. RelNode.computeSelfCost returns RelOptCost has particular measures built in. Would Alireza's proposal require extensibility here to

Re: Streams Query Optimization Using Rate and Window Size

2019-07-08 Thread Stamatis Zampetakis
Hi Alireza, Cost models for streams is a very cool topic but I don't have much knowledge in the domain. Regarding the implementation details if you have custom physical operators then it makes sense to implement computeSelfCost() function as you see fit. Another option is to plug in your custom

Streams Query Optimization Using Rate and Window Size

2019-07-02 Thread Alireza Samadian
Dear Members of Calcite Community, I'm working on Apache Beam SQL and we use Calcite for query optimization. We represent both tables and streams as a subclass of AbstractQueryableTable. In calcite implementation of cost model and statistics, one of the key elements is row count. Also all the