Re: [sqlite] Percentile function in SQLite 3.8.0 (for the .NET version)

RSmith Sat, 30 Nov 2013 07:02:04 -0800

Hi Hayden,

The most usual form of percentile function (I believe) is where you have record values in a selection set (or "sample" from a"population" for the astute statisticians), and you wish to know if you sorted them all by value, and then grouped them in clustersof 1% so that you have a 100 such clusters in a row all numbered from 1 to 100, then in which n-th of those little clusters wouldentry x be?


Of course in real maths the "clusters" are really exact values and non-integer and in 
a set [ c | 0 < c < 100 ].

One of the short-cut methods of getting a percentile is simple indexing, but it won't give us an accurate x.xxx-th percentile, onlyan integer n-th percentile - which might suffice for your application and is a lot quicker to do.

Indexing simply requires ordering the data-values from small to large, then finding "the first value bigger than your value" in thelist, subtract 1, divide by the total number of values in the list ( Count() ) and round it to find value that will be very veryclose to the actual percentile (but not exact).

Similarly, finding which value represents the n-th percentile where you specify the percentile and need the value for it (as is yourquestion's case), you could simply find the n-th percentile index or indices and see which value(s) are in there, the last value ofwhich (in case of multiple indices) usually represents it the best, especially if you reverse-engineer it back to a percentile.


This shortcut in SQL terms then:

SELECT `latency` FROM `latencytable` WHERE 1 ORDER BY `latency` ASC LIMIT (SELECT (Count() / ( 100 / ?1 ) ) FROM `latencytable`WHERE 1), 1;


The value ?1 in this case needs to be the required percentile, so if you are 
looking for the 25th percentile, then ?1 = 25.
(Note: ?1 = 0 may throw exceptions )

Essentially the query counts the values, finds the x-th percentile index (call it n), then lists the table ordered ascendingstarting from offset n and listing only 1 value, et voila.

Now this will be accurate enough on large samples. On really small samples where values are widely differing it will be moreaccurate to find a set of every index that falls within a percentile and then averaging it - which can easily be added to the above,but should not be needed and will take much more processing.


And if you have sets that are very very small (<50 items) then you should not 
have a need to know the percentile.


Hope this helps!
Ryan

(PS. in the query, I use "WHERE 1" and lots of quotes and spaces which all help to see the query structure better, but they are verysuperfluous in SQLite terms and may well be omitted.)




On 2013/11/30 01:38, Hayden Livingston wrote:

Is there any documentation or use case for percentile queries? I've only
seen the source code checked in by "drh".

Is it like the EXCEL 2013 function?

I have a table {operationId, latency) with a set of rows that have floating
values for latencies of a given operation.

I want to find the 25th %ile latency for this rowset.

(A) How do I create a Percentile Query?
(B) How do I make it work for the SQLite System.Data.Sqlite adapter in .NET?

Many thanks!
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Percentile function in SQLite 3.8.0 (for the .NET version)

Reply via email to