[sqlite] Warnings for non-deterministic queries?

R Smith Sat, 28 Nov 2015 13:23:02 +0200


On 2015/11/27 9:58 PM, Simon Slavin wrote:
> On 27 Nov 2015, at 6:30pm, R Smith <rsmith at rsweb.co.za> wrote:
>
>> Let me explain better, let's assume the query contains MAX(x)... The result 
>> will be from whichever row contains the max, and if the x was not in the 
>> aggregate function, it will be from some row (which may or may not be the 
>> one with the max value).
>> In both cases, you have NO chance of knowing which row that would be until 
>> the query finishes, so ALWAYS assume an arbitrary row (or 
>> Always-non-deterministic, to use the OP's words).
> You missed the general principle: there may be any number of rows with the 
> max value.  For example, 8,1,6,3,8,4,8,7,6,8 .  So even if you're using MAX 
> you still can't predict which row is supplying the other values.  So as you 
> (R Smith) says, always assume an arbitrary row.

Indeed you are correct - pardon me not making it clearer - the point was 
to demonstrate using an example that would "feel" deterministic and show 
that even in that case the arbitrary-pick should be expected or catered 
for. Your example further highlights this (and is probably the more 
common case).

On 2015/11/27 11:08 PM, T?r?k Edwin wrote:
> On 11/27/2015 08:30 PM, R Smith wrote:
>> On 2015/11/27 4:44 PM, Adam Devita wrote:
>>> I think the OP meant to write:
>>> "If the expression is an aggregate expression, it is evaluated across
>>> all rows in the group. Otherwise, it is evaluated against a single
>>> arbitrarily chosen row from within the group. "
>>>
>>> Is there a way I could programmatically determine that a query is
>>> going to use an arbitrarily chosen row from within the group at query
>>> prepare time?
> Thanks, non-deterministic was probably the wrong term to use.
> I wanted to find situations where a query's result depends on an 
> implementation detail of SQLite, and the behaviour is not fully specified by 
> the query itself,
> i.e. it could change from one version to the next, or even with same version 
> by slight changes to the DB internal structures.
> So far I know of two possible situations like this: the 'arbitrary row 
> choice' in the question above, and order of results in an unordered select.
>
> For the unordered select there is 'PRAGMA reverse_unordered_selects' that I 
> could use to find bugs, and I wanted to know if there is an equivalent 
> functionality for the arbitrary row choice.
> However as shown below this is probably the wrong question to ask, I should 
> avoid using columns that are not part of an aggregate function/group by in 
> the first place (if using group by).
>

That's a fair request and probably achievable, but a simple habit of not 
expecting specific row-picks should do.

Also, the row-order problem (and accompanying pragma) is not quite in 
the same class as the arbitrary-row-picked problem.

The simplest way to put this (that I can think of) is that with the 
unordered-select pragma, both the outputs BEFORE enabling it and AFTER 
enabling it is determinable and precise (at least within SQLite) for the 
chosen query plan. You can further be guaranteed that the outputs (where 
unordered) will be rearranged "the other way round" whenever two or more 
dissimilar rows are produced.

To enable some pragma that will perhaps pick a different arbitrary row 
depends on a lot of factors and cannot possibly be predetermined. There 
is no "other-way-round" that the picks can go and no guarantee the data 
will have other pick-able options, or indeed how many. i.e. your app 
might pass with flying colours now, and still later break when 
different/more data is encountered or the query optimizer picks a 
different query plan in future.

It should not be that hard to find all aggregated queries and check if 
the user (or code using the result) will depend on a column that isn't 
contained in an aggregate function. Even in a rather big system.

Best of luck!
Ryan

[sqlite] Warnings for non-deterministic queries?

Reply via email to