On Tue, 19 Feb 2013 05:37:38 -0800
Frank Chang <frankchan...@gmail.com> wrote:

> Would anyone know how to use Sqlite to calculate the streakedness of
> data? The definition of streakedness is show below. Thank you for
> your help.
> 
> [EDIT] From our company's chief software architect, here is the
> requirement for a statistical measure. Could someone please define a
> statistical formula based onour architect's definition of data
> streakedness? -- February 19th 2013 8:45 AM

One way to go about this is to ask your architect the criteria by which
cases A-F were sorted.  IOW, what is the definition of "less than"?  It
seems to be:

0.  non-streak has length zero
1.  longest streak wins
2.  if tied for longest streak, more instances of longest rank wins
3.  if tied for #2, repeat (1,2) with next longest streak

But is that true? It seems like you might be measuring serial
correlation, for which you could conventionally use e.g.
Durbin-Watson.  

To use something like your "streakedness" (not
streakiness?) measure is very difficult without some math
functions that SQLite doesn't have.  Essentially, you want

        sum( N * R^S )

where 

        R is the longest possible streak
        "^" is exponentiation
        S is the streak size
        N is the number of streaks of length S

Using this table

select name, N, streak from cases;
name        N           streak    
----------  ----------  ----------
A           1           13        
B           1           7         
B           1           6         
C           1           7         
C           1           6         
D           1           7         
D           1           3         
D           1           2         
E           1           7         
E           1           3         
F           1           7         

in SQL Server, the query would be:

select name
        , sum( N * power(biggest, streak) ) as rank 
from cases 
cross join (
        select 1.0*max(streak) as biggest 
        from cases
) as A 
group by name 
order by rank desc; 
name rank                                     
 ---- -----------------------------------------
 A                            302875106592253.0
 B                                   67575326.0
 C                                   67575326.0
 D                                   62750883.0
 E                                   62750714.0
 F                                   62748517.0

Without the power() function, you'd need a table of all possible values
of R^S, then
        
        select name, sum(N * p.value) as rank
        from cases 
        join powers as p
        on cases.streak = p.S
        where p.R = (
                select 1.0*max(streak) as biggest 
                from cases
        ) 
        group by name, order by rank desc;

--jkl
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to