Re: [DISCUSS] CALCITE-3656, 3657, 1842: cost improvements, cost units

2020-01-09 Thread Vladimir Sitnikov
Michael>If we want to calibrate A part of the question is "What should Aggregate#computeSelfCost return?" A) There's an option to make that method abstract so every sub-class defines its own cost implementation. It might be sad, and it might look like a NLogN duplication all over the place. B)

Re: [DISCUSS] CALCITE-3656, 3657, 1842: cost improvements, cost units

2020-01-09 Thread Michael Mior
Having some kind of calibration that could run would be nice :) I suppose single block read times on HDDs are likely stable, but wee don't really know where the data is coming from. It could an HDD, an SDD or even a network service with variable latency. So I'm not convinced we'll ever get

Re: [DISCUSS] CALCITE-3656, 3657, 1842: cost improvements, cost units

2020-01-04 Thread Vladimir Sitnikov
Technically speaking, single-block read time for HDDs is pretty much stable, so the use of seconds might be not that bad. However, it seconds might be complicated to measure CPU-like activity (e.g. different machines might execute EnumerableJoin at different rate :( ) What if we benchmark a

Re: [DISCUSS] CALCITE-3656, 3657, 1842: cost improvements, cost units

2020-01-04 Thread Michael Mior
I understand the cost doesn't have to match actual execution duration and it doesn't really matter if it does as long as we can get the relative ordering of plans roughly similar. That's why I'm suggesting not calling the cost seconds, even if we are trying to roughly approximate them. But I don't

Re: [DISCUSS] CALCITE-3656, 3657, 1842: cost improvements, cost units

2020-01-04 Thread Vladimir Sitnikov
Michael>although I would be hesitant to refer to "seconds" Do you have better ideas? If my memory serves me well, PostgreSQL uses seconds as well for cost units. OracleDB is using "singleblock read" for the cost unit. Michael>how long execution will take on any particular system The idea for

Re: [DISCUSS] CALCITE-3656, 3657, 1842: cost improvements, cost units

2020-01-04 Thread Michael Mior
A cost unit sounds fine to me, although I would be hesitant to refer to "seconds" or other concrete measurements since there's no easy way to guess how long execution will take on any particular system. -- Michael Mior mm...@apache.org Le sam. 4 janv. 2020 à 10:56, Vladimir Sitnikov a écrit : >

[DISCUSS] CALCITE-3656, 3657, 1842: cost improvements, cost units

2020-01-04 Thread Vladimir Sitnikov
Hi, I've spent some time on stabilizing the costs (see https://github.com/apache/calcite/pull/1702/commits ), and it looks like we might want to have some notion of "cost unit". For instance, we want to express that sorting table with 2 int columns is cheaper than sorting table with 22 int