This is getting far away from SQLite now and moving to medical statistics. The thing is that the authors of this algorithm have taken a large amount of patient data and have calculated what the best predictors are to calculate the chance for the individual patient to have a heart attack or stroke in the next 10 years. For blood pressure they calculated that SD was the best one to get a figure for the variability of systolic blood pressure and they calculated that this SD was relevant. All explained in the article mentioned in the mentioned link: https://qrisk.org/three/
RBS On Sun, Oct 20, 2019 at 2:04 PM Richard Damon <rich...@damon-family.org> wrote: > In my experience max(x) - min(x) isn't a great measure for this sort of > thing, as it is ultimately dependent on outliers. Something like > 75%-tile to 25%-tile (or other similar values) might make more sense. > > Ultimately, taking a Standard Deviation (or Variance) of a set of > reading for a set of persons seems a bit unusual as it is mixing two > different statistics. One is that each person themselves have a mean and > Standard Deviation to their readings, and then you have the statistics > of the whole population, with things like the group mean (the mean of > the individual means), the variation of the means ( the Standard > Deviation of the Individual Means), the mean variation (the Mean of the > individual Standard Deviations) and even the variability of the > varibility (The Standard Deviation of the individual Standard deviations). > > Note that I would definitely treat the readings from an individual as a > sample, unless the patients were being CONTINUALLY monitored, as > periodic readings ARE samples (and presumably the patient is more > important than the actual readings). I might also be tempted to treat > your total group of patients as a sample, unless you think you are (and > have) treating everyone that falls in the category. > > On 10/20/19 8:31 AM, Gabor Grothendieck wrote: > > Another approach is to question whether you really need the > > standard deviation or if any measure of variation would do. > > > > You could use the range, max(x) - min(x) , and just report it as the > range > > rather than the standard deviation. Also range/4 approximates the > > standard deviation (google the range rule) > > although the approximation is not very good. > > > > > > > > On Fri, Oct 18, 2019 at 6:41 PM Gabor Grothendieck > > <ggrothendi...@gmail.com> wrote: > >> There is a stdev function for sqlite here: > >> https://www.sqlite.org/contrib//download/extension-functions.c?get=25 > >> > >> On Wed, Oct 16, 2019 at 7:57 PM Olaf Schmidt <n...@vbrichclient.com> > wrote: > >>> Am 12.10.2019 um 16:47 schrieb Bart Smissaert: > >>>> Sorry, I forgot to tell that. It is date column with an integer > number. > >>>> > >>>> ID xValue xDate > >>>> ---------------- > >>>> 1 130 40123 > >>>> 1 120 41232 > >>>> 1 140 40582 > >>>> 1 100 40888 > >>>> 1 110 42541 > >>>> 2 140 41225 > >>>> 2 130 41589 > >>>> 2 150 40872 > >>> Because SQLite on Android has at least CTEs available, > >>> (since Android 5 or so), I thought I make an attempt using them... > >>> > >>> For your above sample-set I've created a table Test this way > >>> (with slightly changed Column-Names + different IDs with "gaps" > >>> + a few more sample-records, to cover especially n=1 sets): > >>> > >>> Create Table Test(ID, xVal, xDat); > >>> > >>> Insert Into Test Values(3, 130, 40123); > >>> Insert Into Test Values(3, 120, 41232); > >>> Insert Into Test Values(3, 140, 40582); > >>> Insert Into Test Values(3, 100, 40888); > >>> Insert Into Test Values(3, 110, 42541); > >>> > >>> Insert Into Test Values(5, 140, 41225); > >>> Insert Into Test Values(5, 130, 41589); > >>> Insert Into Test Values(5, 150, 40872); > >>> > >>> Insert Into Test Values(6, 110, 41225); > >>> Insert Into Test Values(6, 115, 41227); > >>> > >>> Insert Into Test Values(9, 105, 41225); > >>> > >>> After the above Table-Creation + Inserts, one should > >>> define a "Base-View" first, which is then able to select > >>> appropriate aggregates (respecting your "max-4" condition): > >>> > >>> Create View vw_StdDev_Max4_Aggregates As > >>> Select ID, Avg(xVal) a1, Avg(xVal*xVal) a2, Count(*) n From Test T > >>> Where xDat In (Select xDat From Test Where T.ID=ID Order By xDat Desc > >>> Limit 4) > >>> Group By ID; > >>> > >>> Only this view above has to be adjusted, according to your > >>> real table- and column-names, but the following two "derived views", > >>> will work after these changes as well, as long as you leave the > >>> above view-name - and the field-names: (ID, a1, a2, n) intact. > >>> > >>> So what remains, is two more view-defs (one for the population-StdDev - > >>> and one with a few slight changes, to cover the sample-StdDev as well. > >>> > >>> Create View vw_StdDev_Samples As > >>> Select ID, n n_SampleSize, (a2-a1*a1)*n/(n-1) "s^2", ( > >>> With r(s2, s, i) As (Select (a2-a1*a1)*n/(n-1), 1, 1 Union All > >>> Select s2, (s2>0)*(s+s2/s)*.5, i+1 From r Where Abs(s2-s*s)>1e-12 > >>> Limit 32 > >>> ) Select s From r Order By i Desc Limit 1 > >>> ) s From vw_StdDev_Max4_Aggregates Where n>1 > >>> Union All > >>> Select ID, n, Null, Null From vw_StdDev_Max4_Aggregates Where n=1; > >>> > >>> Create View vw_StdDev_Populations As > >>> Select ID, n n_PopulationSize, (a2-a1*a1) "sigma^2", ( > >>> With r(s2, s, i) As (Select (a2-a1*a1), 1, 1 Union All > >>> Select s2, (s2>0)*(s+s2/s)*.5, i+1 From r Where Abs(s2-s*s)>1e-12 > >>> Limit 32 > >>> ) Select s From r Order By i Desc Limit 1 > >>> ) sigma From vw_StdDev_Max4_Aggregates Where n>1 > >>> Union All > >>> Select ID, n, 0, 0 From vw_StdDev_Max4_Aggregates Where n=1; > >>> > >>> ------------------------------------ > >>> Ok, here the result I get on my test-table, when the 3 views > >>> are in place, and the following gets executed: > >>> > >>> Select ID, n_PopulationSize, sigma From vw_StdDev_Populations; > >>> ID n sigma > >>> ------------------------ > >>> 3 4 14.7901994577491 > >>> 5 3 8.16496580927733 > >>> 6 2 2.5 > >>> 9 1 0 > >>> > >>> > >>> Select ID, n_SampleSize, s From vw_StdDev_Samples; > >>> ID n s > >>> ------------------------ > >>> 3 4 17.0782512765993 > >>> 5 3 10.0000000000001 > >>> 6 2 3.53553390593274 > >>> 9 1 null > >>> > >>> Please note the returned null (vs. the real 0) in the last result > >>> (done, because "unbiased estimates" are undefined for sample-size == 1) > >>> > >>> Finally for those interested, the CTE-based "iterated Square-Root", > >>> cut out for an isolated calculation of sqrt(2). > >>> > >>> With r(s2, s, i) As (Select 2, 1, 1 Union All > >>> Select s2, (s2>0)*(s+s2/s)*.5, i+1 From r Where Abs(s2-s*s)>1e-12 > >>> Limit 32 > >>> ) Select s From r Order By i Desc Limit 1 > >>> > >>> The max. iterations are limited to 32 - though for input-values > >>> in a "typical range", the iteration will usually exit earlier... > >>> In case of the example above -> sqrt(2), the iteration-counter > >>> i reached only 6 (as can be seen, when you include i alongside s > >>> in the final Select. > >>> > >>> With r(s2, s, i) As (Select 2, 1, 1 Union All > >>> Select s2, (s2>0)*(s+s2/s)*.5, i+1 From r Where Abs(s2-s*s)>1e-12 > >>> Limit 32 > >>> ) Select i, s From r Order By i Desc Limit 1 > >>> > >>> HTH > >>> > >>> Olaf > >>> > >>> > >>> > >>> _______________________________________________ > >>> sqlite-users mailing list > >>> sqlite-users@mailinglists.sqlite.org > >>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > >> > >> > >> -- > >> Statistics & Software Consulting > >> GKX Group, GKX Associates Inc. > >> tel: 1-877-GKX-GROUP > >> email: ggrothendieck at gmail.com > > > > > > -- > Richard Damon > > _______________________________________________ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > _______________________________________________ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users