Re: [sqlite] Standard deviation last x entries

Bart Smissaert Sun, 20 Oct 2019 07:07:36 -0700

This is getting far away from SQLite now and moving to medical statistics.
The thing is that the authors of this algorithm have taken a large amount
of patient
data and have calculated what the best predictors are to calculate the
chance for the
individual patient to have a heart attack or stroke in the next 10 years.
For blood pressure
they calculated that SD was the best one to get a figure for the
variability of systolic blood
pressure and they calculated that this SD was relevant.
All explained in the article mentioned in the mentioned link:
https://qrisk.org/three/


RBS

On Sun, Oct 20, 2019 at 2:04 PM Richard Damon <rich...@damon-family.org>
wrote:

> In my experience max(x) - min(x) isn't a great measure for this sort of
> thing, as it is ultimately dependent on outliers. Something like
> 75%-tile to 25%-tile (or other similar values) might make more sense.
>
> Ultimately, taking a Standard Deviation (or Variance) of a set of
> reading for a set of persons seems a bit unusual as it is mixing two
> different statistics. One is that each person themselves have a mean and
> Standard Deviation to their readings, and then you have the statistics
> of the whole population, with things like the group mean (the mean of
> the individual means), the variation of the means ( the Standard
> Deviation of the Individual Means), the mean variation (the Mean of the
> individual Standard Deviations) and even the variability of the
> varibility (The Standard Deviation of the individual Standard deviations).
>
> Note that I would definitely treat the readings from an individual as a
> sample, unless the patients were being CONTINUALLY monitored, as
> periodic readings ARE samples (and presumably the patient is more
> important than the actual readings). I might also be tempted to treat
> your total group of patients as a sample, unless you think you are (and
> have) treating everyone that falls in the category.
>
> On 10/20/19 8:31 AM, Gabor Grothendieck wrote:
> > Another approach is to question whether you really need the
> > standard deviation or if any measure of variation would do.
> >
> > You could use the range, max(x) - min(x) , and just report it as the
> range
> > rather than the standard deviation.  Also range/4 approximates the
> > standard deviation (google the range rule)
> > although the approximation is not very good.
> >
> >
> >
> > On Fri, Oct 18, 2019 at 6:41 PM Gabor Grothendieck
> > <ggrothendi...@gmail.com> wrote:
> >> There is a stdev function for sqlite here:
> >> https://www.sqlite.org/contrib//download/extension-functions.c?get=25
> >>
> >> On Wed, Oct 16, 2019 at 7:57 PM Olaf Schmidt <n...@vbrichclient.com>
> wrote:
> >>> Am 12.10.2019 um 16:47 schrieb Bart Smissaert:
> >>>> Sorry, I forgot to tell that. It is date column with an integer
> number.
> >>>>
> >>>> ID xValue xDate
> >>>> ----------------
> >>>> 1   130      40123
> >>>> 1   120      41232
> >>>> 1   140      40582
> >>>> 1   100      40888
> >>>> 1   110      42541
> >>>> 2   140      41225
> >>>> 2   130      41589
> >>>> 2   150      40872
> >>> Because SQLite on Android has at least CTEs available,
> >>> (since Android 5 or so), I thought I make an attempt using them...
> >>>
> >>> For your above sample-set I've created a table Test this way
> >>> (with slightly changed Column-Names + different IDs with "gaps"
> >>> + a few more sample-records, to cover especially n=1 sets):
> >>>
> >>> Create Table Test(ID, xVal, xDat);
> >>>
> >>> Insert Into Test Values(3, 130, 40123);
> >>> Insert Into Test Values(3, 120, 41232);
> >>> Insert Into Test Values(3, 140, 40582);
> >>> Insert Into Test Values(3, 100, 40888);
> >>> Insert Into Test Values(3, 110, 42541);
> >>>
> >>> Insert Into Test Values(5, 140, 41225);
> >>> Insert Into Test Values(5, 130, 41589);
> >>> Insert Into Test Values(5, 150, 40872);
> >>>
> >>> Insert Into Test Values(6, 110, 41225);
> >>> Insert Into Test Values(6, 115, 41227);
> >>>
> >>> Insert Into Test Values(9, 105, 41225);
> >>>
> >>> After the above Table-Creation + Inserts, one should
> >>> define a "Base-View" first, which is then able to select
> >>> appropriate aggregates (respecting your "max-4" condition):
> >>>
> >>> Create View vw_StdDev_Max4_Aggregates As
> >>> Select ID, Avg(xVal) a1, Avg(xVal*xVal) a2, Count(*) n From Test T
> >>> Where xDat In (Select xDat From Test Where T.ID=ID Order By xDat Desc
> >>> Limit 4)
> >>> Group By ID;
> >>>
> >>> Only this view above has to be adjusted, according to your
> >>> real table- and column-names, but the following two "derived views",
> >>> will work after these changes as well, as long as you leave the
> >>> above view-name - and the field-names: (ID, a1, a2, n) intact.
> >>>
> >>> So what remains, is two more view-defs (one for the population-StdDev -
> >>> and one with a few slight changes, to cover the sample-StdDev as well.
> >>>
> >>> Create View vw_StdDev_Samples As
> >>> Select ID, n n_SampleSize, (a2-a1*a1)*n/(n-1) "s^2", (
> >>>    With r(s2, s, i) As (Select (a2-a1*a1)*n/(n-1), 1, 1 Union All
> >>>      Select s2, (s2>0)*(s+s2/s)*.5, i+1 From r Where Abs(s2-s*s)>1e-12
> >>> Limit 32
> >>>    ) Select s From r Order By i Desc Limit 1
> >>> ) s From vw_StdDev_Max4_Aggregates Where n>1
> >>> Union All
> >>> Select ID, n, Null, Null From vw_StdDev_Max4_Aggregates Where n=1;
> >>>
> >>> Create View vw_StdDev_Populations As
> >>> Select ID, n n_PopulationSize, (a2-a1*a1) "sigma^2", (
> >>>    With r(s2, s, i) As (Select (a2-a1*a1), 1, 1 Union All
> >>>      Select s2, (s2>0)*(s+s2/s)*.5, i+1 From r Where Abs(s2-s*s)>1e-12
> >>> Limit 32
> >>>    ) Select s From r Order By i Desc Limit 1
> >>> ) sigma From vw_StdDev_Max4_Aggregates Where n>1
> >>> Union All
> >>> Select ID, n, 0, 0 From vw_StdDev_Max4_Aggregates Where n=1;
> >>>
> >>> ------------------------------------
> >>> Ok, here the result I get on my test-table, when the 3 views
> >>> are in place, and the following gets executed:
> >>>
> >>> Select ID, n_PopulationSize, sigma From vw_StdDev_Populations;
> >>> ID  n   sigma
> >>> ------------------------
> >>> 3   4   14.7901994577491
> >>> 5   3   8.16496580927733
> >>> 6   2   2.5
> >>> 9   1   0
> >>>
> >>>
> >>> Select ID, n_SampleSize, s From vw_StdDev_Samples;
> >>> ID  n   s
> >>> ------------------------
> >>> 3   4   17.0782512765993
> >>> 5   3   10.0000000000001
> >>> 6   2   3.53553390593274
> >>> 9   1   null
> >>>
> >>> Please note the returned null (vs. the real 0) in the last result
> >>> (done, because "unbiased estimates" are undefined for sample-size == 1)
> >>>
> >>> Finally for those interested, the CTE-based "iterated Square-Root",
> >>> cut out for an isolated calculation of sqrt(2).
> >>>
> >>> With r(s2, s, i) As (Select 2, 1, 1 Union All
> >>>    Select s2, (s2>0)*(s+s2/s)*.5, i+1 From r Where Abs(s2-s*s)>1e-12
> >>> Limit 32
> >>> ) Select s From r Order By i Desc Limit 1
> >>>
> >>> The max. iterations are limited to 32 - though for input-values
> >>> in a "typical range", the iteration will usually exit earlier...
> >>> In case of the example above -> sqrt(2), the iteration-counter
> >>> i reached only 6 (as can be seen, when you include i alongside s
> >>> in the final Select.
> >>>
> >>> With r(s2, s, i) As (Select 2, 1, 1 Union All
> >>>    Select s2, (s2>0)*(s+s2/s)*.5, i+1 From r Where Abs(s2-s*s)>1e-12
> >>> Limit 32
> >>> ) Select i, s From r Order By i Desc Limit 1
> >>>
> >>> HTH
> >>>
> >>> Olaf
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> sqlite-users mailing list
> >>> sqlite-users@mailinglists.sqlite.org
> >>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> >>
> >>
> >> --
> >> Statistics & Software Consulting
> >> GKX Group, GKX Associates Inc.
> >> tel: 1-877-GKX-GROUP
> >> email: ggrothendieck at gmail.com
> >
> >
>
> --
> Richard Damon
>
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Standard deviation last x entries

Reply via email to