On 2018/06/30 3:12 PM, Luuk wrote:
Ok ,my highway to hell start here (regargind the use of SQL)

In SQLite3 you are allowed to do this:
SELECT a,b,c
FROM t1
GROUP BY a

The values of 'b' and 'c' will be taken from a 'random' row...

But if we rewrite this in SQL, i am getting something like this:
SELECT
    a,
    (SELECT MIN(b) FROM T1 WHERE a=t.a) AS B,
    (SELECT MIN(c) FROM T1 WHERE a=t.a) AS C
FROM t1 t
GROUP BY a

QUESTION: How does one get the proper relationship between 'B' and 'C'?,

This question comes up from time to time, and is sometimes misunderstood by SQL users everywhere (or from every Engine I should say).

There is no easy solution and it isn't really an aggregate Set-Algebra problem, so mostly not handled specifically by Engines for aggregate queries.

To just state the problem more fully:

Consider this Table "Fruit" in a Grocer's shop:
SELECT * FROM Fruit;
  --         |              |     |  DaysOn-
  -- Name    |   BasketNo   | Qty |   Shelf
  -- ------- | ------------ | --- | ---------
  -- Apples  |       1      |  40 |     12
  -- Apples  |       2      |  40 |     14
  -- Apples  |       3      |  23 |     16
  -- Oranges |       4      |  40 |     12
  -- Oranges |       5      |  40 |     14
  -- Oranges |       6      |  11 |     16
  -- Apples  |       7      |  7  |     22

An unopened basket contains 40 fruit. We try to use the oldest first, etc.
Now suppose we want to know what is the oldest unopened basket for every kind of fruit. The simplest query in *SQLite* would be:

SELECT Name, MAX(DaysOnShelf) AS MaxAge, BasketNo
  FROM Fruit
 WHERE Qty = 40
 GROUP BY Name
;

  -- Name    | MaxAge |   BasketNo
  -- ------- | ------ | ------------
  -- Apples  |   14   |       2
  -- Oranges |   14   |       5

Note that this ONLY works due to a peculiarity in SQLite, and it is forewarned in the SQLite documents that the non-aggregate field could return ANY random value that qualifies to be in a selected row (i.e. any row that is chosen for aggregation by the WHERE clause filter).

To say exactly what that means: In the query, for the Apples aggregate, two rows "pass" the filter where Qty = 40, namely those with baskets 1 and 2. MAX(DaysOnShelf) for those two rows correctly returns 14, but there is no guarantee that the returned BasketNo will be from THAT specific row, it could have returned (still mathematically correct) Basket 1 in stead of Basket 2.  And, in a next release, it might even do so. You cannot assume what it would be.

In fact, if we change the query to enclose BasketNo too in an aggregate function (MIN() in this case):
SELECT Name, MAX(DaysOnShelf) AS MaxAge, MIN(BasketNo) AS BasketNo
  FROM Fruit
 WHERE Qty = 40
 GROUP BY Name
;

  -- Name    | MaxAge |   BasketNo
  -- ------- | ------ | ------------
  -- Apples  |   14   |       1
  -- Oranges |   14   |       4

Now the 14 and 1 next to Apples are definitely NOT from the same row - which is the entire point of aggregate functions, we want to know the truth over the whole set for each function, it is not intended to pick out values.

Thing is, most developers are precisely interested in the specific row containing the qualifying aggregate.

There are two ways to solve this (using standard SQL in any SQL engine - there might be more ways in specific Engines, such as SQLite).

1 - First find the target aggregate, then look that up against the original (non aggregate) set of records.
2 - Construct a custom taxonomy and deconstruct after.

1 is by far the most used and the easiest to write. It's downside is that it requires a double look-up loop, which in most Engines is fairly efficient anyway.
An example of this:

SELECT X.Name, X.MaxAge, F.BasketNo
  FROM (
    SELECT Name, MAX(DaysOnShelf) AS MaxAge, MAX(Qty) AS Qty
      FROM Fruit
     WHERE Qty = 40
     GROUP BY Name
  ) AS X
  JOIN Fruit AS F ON F.Name = X.Name AND F.DaysOnShelf = X.MaxAge AND F.Qty = X.Qty
;
  -- Name    | MaxAge |   BasketNo
  -- ------- | ------ | ------------
  -- Apples  |   14   |       2
  -- Oranges |   14   |       5


-- The above example shows ALL rows that match (as is the nature of a JOIN).  To only show a single row, the following can work:

SELECT X.Name, X.MaxAge, (
       SELECT F.BasketNo FROM Fruit AS F WHERE F.Name = X.Name AND F.DaysOnShelf = X.MaxAge AND F.Qty = X.Qty LIMIT 1
       ) AS BasketNo
  FROM (
    SELECT Name, MAX(DaysOnShelf) AS MaxAge, MAX(Qty) AS Qty
      FROM Fruit
     WHERE Qty = 40
     GROUP BY Name
  ) AS X
;
  -- Name    | MaxAge |   BasketNo
  -- ------- | ------ | ------------
  -- Apples  |   14   |       2
  -- Oranges |   14   |       5


-- And lastly, an example of achieving the first query with a CTE:

WITH X(Name, MaxAge, Qty) AS (
    SELECT Name, MAX(DaysOnShelf), MAX(Qty)
      FROM Fruit
     WHERE Qty = 40
     GROUP BY Name
)
SELECT X.Name, X.MaxAge, F.BasketNo
  FROM X
  JOIN Fruit AS F ON F.Name = X.Name AND F.DaysOnShelf = X.MaxAge AND F.Qty = X.Qty
;
  -- Name    | MaxAge |   BasketNo
  -- ------- | ------ | ------------
  -- Apples  |   14   |       2
  -- Oranges |   14   |       5



The second way to do it is the custom taxonomy. This simply means we fuse together some fields so they are treated as if they are 1 unit, with the most significant field first. If the Engine sees them as 1 value, they must always be regarded together, and the aggregate function typically acts upon the most significant part of the value[1]. After we get the aggregate returned, we can take them apart again and have the details we want. This is rather easy with TEXT values, so I'm going to use the same example as above to show how it can also be done with Numeric values.

The only real advantage of this method is that it requires only a single lookup loop, though it does come with some string-handling overhead - as always, it's best to test the speed with the target DB.

This first example shows only half the solution for clarity (I am not taking it apart yet) so that you can see what the fused-together values look like[2]. We simply make the Age value into a known-length number by adding it (I chose 1000 since fruit older than 1000 days is, well, unlikely, but you can use much larger values) and then find the MAX() for it:

SELECT X.Name, X.MaxAge_Basket
  FROM (
    SELECT Name, MAX((1000+DaysOnShelf)||':'||BasketNo) AS MaxAge_Basket
      FROM Fruit
     WHERE Qty = 40
     GROUP BY Name
  ) AS X
;
  --         | MaxAge_B-
  -- Name    | asket
  -- ------- | ---------
  -- Apples  | 1014:2
  -- Oranges | 1014:5

[1] "Fusing-together"typically results in a string, though you can do a value-based taxonomy. An example of it is referring to Months by their (Year*100+Month) value, so that September 2017 is 201709 and 202012 is December 2020. Aggregate functions on strings consider the most significant part always to the left. i.e. "012" > "0099999999999" and if you want to compare numbers when they are in strings, you have to make sure the left leading length is similar: "00012" < "00014", etc.

[2] The colon is not technically needed, it's just added here for clarity.


And finally here taking that fused-together bit apart (and subtracting the 1000 we added) to show the two actual values. This guarantees that the shown Basket belongs to that specific MaxAge:

SELECT X.Name, CAST(substr(X.MaxAge_Basket,1,4) AS INT)-1000 AS MaxAge, substr(X.MaxAge_Basket,6) AS Basket
  FROM (
    SELECT Name, MAX((1000+DaysOnShelf)||':'||BasketNo) AS MaxAge_Basket
      FROM Fruit
     WHERE Qty = 40
     GROUP BY Name
  ) AS X
;
  -- Name    | MaxAge | Basket
  -- ------- | ------ | ------
  -- Apples  |   14   |   2
  -- Oranges |   14   |   5


This last example achieves the same using an Integer taxonomy:

SELECT X.Name, X.MaxAge_Basket / 100 AS MaxAge, X.MaxAge_Basket % 100 AS Basket
  FROM (
    SELECT Name, 100 * DaysOnShelf + BasketNo AS MaxAge_Basket
      FROM Fruit
     WHERE Qty = 40
     GROUP BY Name
  ) AS X
;
  -- Name    | MaxAge |    Basket
  -- ------- | ------ | ------------
  -- Apples  |   14   |       2
  -- Oranges |   14   |       5


These work in any Engine.

Hope that all makes sense!
Ryan







_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to