I'd like to use CASE to compare an expression x against a number of candidate values. That's the typical use for "CASE x WHEN", which avoids repeating x for each condition.

The trouble is that one of the possible values is NULL, yet the comparison against each candidate value is done with the = operator. The expression "x = NULL" is meaningless since it will always evaluate to NULL, which CASE interprets as untruth, hence "WHEN NULL THEN" will never accomplish anything.

The workaround is to not use "CASE x WHEN" and instead use "CASE WHEN" and repeat x every time, using = for all non-NULL values and IS for NULL. But this means repeating x for each condition, which is the whole reason I'd prefer "CASE x WHEN".

A compromise is to do both, as follows:

CASE x
WHEN 1 THEN 11
WHEN 2 THEN 22
WHEN 3 THEN 33
WHEN 4 THEN 44
ELSE CASE
WHEN x IS NULL THEN 55
ELSE 66
END END

Or the other way around, so that both instances of x are near each other:

CASE WHEN x IS NULL THEN 55
ELSE CASE x
WHEN 1 THEN 11
WHEN 2 THEN 22
WHEN 3 THEN 33
WHEN 4 THEN 44
ELSE 66
END END

If the CASE statement enumerates all possible values of x (whose range is perhaps guaranteed with a CHECK or FOREIGN KEY constraint), then the ELSE case can stand in for the NULL comparison. But otherwise, the ELSE case will unavoidably collect not only NULL but any other unhandled values.

So I'm wondering: can we do better? I wouldn't want to risk changing the meaning of any existing queries, but it is generally possible to extend from the error space: take something that's currently a syntax error and give it meaning. How about the following?

CASE x
WHEN 1 THEN 11
WHEN 2 THEN 22
WHEN 3 THEN 33
WHEN 4 THEN 44
WHEN IS NULL THEN 55
ELSE 66
END

Or equivalently, "ISNULL" instead of "IS NULL".

This treatment could also be applied to numerous other operators that take an expression as their left-hand side and produce a truth result:

?NOT? LIKE|GLOB|REGEXP|MATCH expr ?ESCAPE expr?
NOTNULL
NOT NULL
< <= > >= != <>
= == (for completeness, even though it's implied)
?NOT? BETWEEN expr AND expr
?NOT? IN list-generation-expression

Giving us syntax such as:

CREATE TABLE fruits (name); [... veggies ... meats ...]
CASE food
WHEN IN fruits THEN 'fruit'
WHEN IN veggies THEN 'veggie'
WHEN IN meats THEN 'meat'
WHEN ISNULL THEN 'unspecified'
ELSE 'candy' END

CASE filename
WHEN GLOB '.*' THEN 'hidden'
WHEN GLOB '*.png' THEN 'image'
WHEN GLOB '*.html' THEN 'webpage'
WHEN REGEXP '\.docx?$' THEN 'MS-Word'
ELSE 'data' END

CASE age
WHEN < 1 THEN 'baby'
WHEN < 3 THEN 'toddler'
WHEN < 5 THEN 'preschooler'
WHEN < 12 THEN 'gradeschooler'
WHEN < 18 THEN 'teenager'
WHEN < 21 THEN 'young adult'
ELSE 'adult' END

CASE hour
WHEN BETWEEN  6   AND  6.5 THEN 'wake'
WHEN BETWEEN  7   AND  7.5 THEN 'breakfast'
WHEN BETWEEN  8   AND  8.5 THEN 'commute'
WHEN BETWEEN 11.5 AND 12.5 THEN 'lunch'
WHEN BETWEEN  9   AND 17   THEN 'work'
WHEN BETWEEN 17.5 AND 18   THEN 'commute'
WHEN BETWEEN 19   AND 19.5 THEN 'dinner'
WHEN BETWEEN 22   AND 24   THEN 'sleep'
WHEN BETWEEN  0   AND  6   THEN 'sleep'
END

To make the above examples more compelling, replace food, filename, age, and hour with complex expressions such as nested queries.

The next evolution in bloat is to also support AND, OR, NOT, and parentheses, allowing the LHS operand of any operator in a complex expression to be omitted, defaulting to CASE's first argument. In the last example above this would allow the two "sleep" cases (or the "commute" cases) to be combined with OR: "WHEN BETWEEN 22 AND 24 OR BETWEEN 0 and 6 THEN 'sleep'". But I imagine this would complicate the parser far beyond any practical benefit.

--
Andy Goth | <andrew.m.goth/at/gmail/dot/com>
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to