Re: [sqlite] Standard deviation last x entries

Richard Damon Sun, 20 Oct 2019 06:05:11 -0700

In my experience max(x) - min(x) isn't a great measure for this sort of
thing, as it is ultimately dependent on outliers. Something like
75%-tile to 25%-tile (or other similar values) might make more sense.


Ultimately, taking a Standard Deviation (or Variance) of a set of
reading for a set of persons seems a bit unusual as it is mixing two
different statistics. One is that each person themselves have a mean and
Standard Deviation to their readings, and then you have the statistics
of the whole population, with things like the group mean (the mean of
the individual means), the variation of the means ( the Standard
Deviation of the Individual Means), the mean variation (the Mean of the
individual Standard Deviations) and even the variability of the
varibility (The Standard Deviation of the individual Standard deviations).

Note that I would definitely treat the readings from an individual as a
sample, unless the patients were being CONTINUALLY monitored, as
periodic readings ARE samples (and presumably the patient is more
important than the actual readings). I might also be tempted to treat
your total group of patients as a sample, unless you think you are (and
have) treating everyone that falls in the category.

On 10/20/19 8:31 AM, Gabor Grothendieck wrote:
> Another approach is to question whether you really need the
> standard deviation or if any measure of variation would do.
>
> You could use the range, max(x) - min(x) , and just report it as the range
> rather than the standard deviation.  Also range/4 approximates the
> standard deviation (google the range rule)
> although the approximation is not very good.
>
>
>
> On Fri, Oct 18, 2019 at 6:41 PM Gabor Grothendieck
> <ggrothendi...@gmail.com> wrote:
>> There is a stdev function for sqlite here:
>> https://www.sqlite.org/contrib//download/extension-functions.c?get=25
>>
>> On Wed, Oct 16, 2019 at 7:57 PM Olaf Schmidt <n...@vbrichclient.com> wrote:
>>> Am 12.10.2019 um 16:47 schrieb Bart Smissaert:
>>>> Sorry, I forgot to tell that. It is date column with an integer number.
>>>>
>>>> ID xValue xDate
>>>> ----------------
>>>> 1   130      40123
>>>> 1   120      41232
>>>> 1   140      40582
>>>> 1   100      40888
>>>> 1   110      42541
>>>> 2   140      41225
>>>> 2   130      41589
>>>> 2   150      40872
>>> Because SQLite on Android has at least CTEs available,
>>> (since Android 5 or so), I thought I make an attempt using them...
>>>
>>> For your above sample-set I've created a table Test this way
>>> (with slightly changed Column-Names + different IDs with "gaps"
>>> + a few more sample-records, to cover especially n=1 sets):
>>>
>>> Create Table Test(ID, xVal, xDat);
>>>
>>> Insert Into Test Values(3, 130, 40123);
>>> Insert Into Test Values(3, 120, 41232);
>>> Insert Into Test Values(3, 140, 40582);
>>> Insert Into Test Values(3, 100, 40888);
>>> Insert Into Test Values(3, 110, 42541);
>>>
>>> Insert Into Test Values(5, 140, 41225);
>>> Insert Into Test Values(5, 130, 41589);
>>> Insert Into Test Values(5, 150, 40872);
>>>
>>> Insert Into Test Values(6, 110, 41225);
>>> Insert Into Test Values(6, 115, 41227);
>>>
>>> Insert Into Test Values(9, 105, 41225);
>>>
>>> After the above Table-Creation + Inserts, one should
>>> define a "Base-View" first, which is then able to select
>>> appropriate aggregates (respecting your "max-4" condition):
>>>
>>> Create View vw_StdDev_Max4_Aggregates As
>>> Select ID, Avg(xVal) a1, Avg(xVal*xVal) a2, Count(*) n From Test T
>>> Where xDat In (Select xDat From Test Where T.ID=ID Order By xDat Desc
>>> Limit 4)
>>> Group By ID;
>>>
>>> Only this view above has to be adjusted, according to your
>>> real table- and column-names, but the following two "derived views",
>>> will work after these changes as well, as long as you leave the
>>> above view-name - and the field-names: (ID, a1, a2, n) intact.
>>>
>>> So what remains, is two more view-defs (one for the population-StdDev -
>>> and one with a few slight changes, to cover the sample-StdDev as well.
>>>
>>> Create View vw_StdDev_Samples As
>>> Select ID, n n_SampleSize, (a2-a1*a1)*n/(n-1) "s^2", (
>>>    With r(s2, s, i) As (Select (a2-a1*a1)*n/(n-1), 1, 1 Union All
>>>      Select s2, (s2>0)*(s+s2/s)*.5, i+1 From r Where Abs(s2-s*s)>1e-12
>>> Limit 32
>>>    ) Select s From r Order By i Desc Limit 1
>>> ) s From vw_StdDev_Max4_Aggregates Where n>1
>>> Union All
>>> Select ID, n, Null, Null From vw_StdDev_Max4_Aggregates Where n=1;
>>>
>>> Create View vw_StdDev_Populations As
>>> Select ID, n n_PopulationSize, (a2-a1*a1) "sigma^2", (
>>>    With r(s2, s, i) As (Select (a2-a1*a1), 1, 1 Union All
>>>      Select s2, (s2>0)*(s+s2/s)*.5, i+1 From r Where Abs(s2-s*s)>1e-12
>>> Limit 32
>>>    ) Select s From r Order By i Desc Limit 1
>>> ) sigma From vw_StdDev_Max4_Aggregates Where n>1
>>> Union All
>>> Select ID, n, 0, 0 From vw_StdDev_Max4_Aggregates Where n=1;
>>>
>>> ------------------------------------
>>> Ok, here the result I get on my test-table, when the 3 views
>>> are in place, and the following gets executed:
>>>
>>> Select ID, n_PopulationSize, sigma From vw_StdDev_Populations;
>>> ID  n   sigma
>>> ------------------------
>>> 3   4   14.7901994577491
>>> 5   3   8.16496580927733
>>> 6   2   2.5
>>> 9   1   0
>>>
>>>
>>> Select ID, n_SampleSize, s From vw_StdDev_Samples;
>>> ID  n   s
>>> ------------------------
>>> 3   4   17.0782512765993
>>> 5   3   10.0000000000001
>>> 6   2   3.53553390593274
>>> 9   1   null
>>>
>>> Please note the returned null (vs. the real 0) in the last result
>>> (done, because "unbiased estimates" are undefined for sample-size == 1)
>>>
>>> Finally for those interested, the CTE-based "iterated Square-Root",
>>> cut out for an isolated calculation of sqrt(2).
>>>
>>> With r(s2, s, i) As (Select 2, 1, 1 Union All
>>>    Select s2, (s2>0)*(s+s2/s)*.5, i+1 From r Where Abs(s2-s*s)>1e-12
>>> Limit 32
>>> ) Select s From r Order By i Desc Limit 1
>>>
>>> The max. iterations are limited to 32 - though for input-values
>>> in a "typical range", the iteration will usually exit earlier...
>>> In case of the example above -> sqrt(2), the iteration-counter
>>> i reached only 6 (as can be seen, when you include i alongside s
>>> in the final Select.
>>>
>>> With r(s2, s, i) As (Select 2, 1, 1 Union All
>>>    Select s2, (s2>0)*(s+s2/s)*.5, i+1 From r Where Abs(s2-s*s)>1e-12
>>> Limit 32
>>> ) Select i, s From r Order By i Desc Limit 1
>>>
>>> HTH
>>>
>>> Olaf
>>>
>>>
>>>
>>> _______________________________________________
>>> sqlite-users mailing list
>>> sqlite-users@mailinglists.sqlite.org
>>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>
>>
>> --
>> Statistics & Software Consulting
>> GKX Group, GKX Associates Inc.
>> tel: 1-877-GKX-GROUP
>> email: ggrothendieck at gmail.com
>
>

-- 
Richard Damon

_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Standard deviation last x entries

Reply via email to