[jira] [Created] (CALCITE-4496) Measure columns ("SELECT ... AS MEASURE")

Julian Hyde (Jira) Thu, 11 Feb 2021 15:32:04 -0800

Julian Hyde created CALCITE-4496:
------------------------------------

             Summary: Measure columns ("SELECT ... AS MEASURE")
                 Key: CALCITE-4496
                 URL: https://issues.apache.org/jira/browse/CALCITE-4496
             Project: Calcite
          Issue Type: Bug
            Reporter: Julian Hyde



In multi-dimensional languages such as MDX, DAX, Tableau, you can define 
calculations in your models that can be re-evaluated in other dimensional 
contexts. (The models are often called cubes, and the calculations are often 
called measures.)

In SQL, the model is a view (or a sub-query in the FROM clause) but the columns 
are just values. Suppose you have a private {{Employees}} table, a 
{{Departments}} view that rolls {{Employees}} up to department level and has an 
{{averageSalary}} column. Now suppose you wish to roll up {{averageSalary}} to 
the region level. The values that went into {{averageSalary}} are not available 
to you, either directly or indirectly, so the best you can do is to 
average-the-averages.

In this proposed (and experimental) feature, you can define a special kind of 
column - a measure - in the SELECT list of a view (or sub-query in a FROM 
clause), and it remains a calculation. When a query uses a measure column, the 
calculation is re-evaluated in the context of that query.

To some extent, this breaches the "black box" property of SQL views. Hitherto, 
a SQL view can be replaced with a table that has the same contents, and all 
queries that use that view will return the same results. That property no 
longer holds. But the view remains a useful "hiding" abstraction, and the rows 
that compose that view cannot be viewed directly.

Like dimensional models, measures in SQL would allow high-level abstractions 
such as key-performance indicators (KPIs) to be shared and composed. Unlike 
dimensional models, the models remain relational, namely, it is still possible 
to enumerate and count the rows in a model.

Consider the following view and query that uses it:
{code:sql}
CREATE VIEW EmpSummary AS
SELECT deptno,
    job,
    AVG(sal) AS avg_sal,
    AVG(sal) AS MEASURE avg_sal_measure,
    COUNT(*) + 1 AS MEASURE count_plus_one_measure
FROM Emp
GROUP BY deptno, job;

SELECT deptno,
    AVG(avg_sal) AS a1,
    AGGREGATE(avg_sal_measure) AS a2,
    AGGREGATE(count_plus_one_measure) AS c1
FROM EmpSummary
GROUP BY deptno;{code}

Note that there is a special aggregate function, {{AGGREGATE}}, that rolls up 
measures. Columns {{a1}} and {{a2}} will contain different values; the first 
averages the averages, and the second computes the average from the raw data. 
Column {{c1}} will return the number of employees in each department plus one, 
not rolling up the "plus one" for each distinct job in the department.

This is just a brief sketch illustrating the purpose of measures. This feature 
is experimental, the syntax will no doubt change, and much of the semantics 
(for example, what expressions are valid as measures, whether measures remain 
measures they appear in the SELECT clause of an enclosing query, and what is 
the "context" in which a measure is evaluated) need to be ironed out.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (CALCITE-4496) Measure columns ("SELECT ... AS MEASURE")

Reply via email to