Re: [sqlite] Break on cumulative sum

James K. Lowden Sun, 03 Mar 2013 01:34:09 -0800

On Sat, 2 Mar 2013 14:46:40 +0100
Gert Van Assche <ger...@datamundi.be> wrote:


> All, I don't know how to achieve this: I need to put the cumulative
> sum in a field, and create a group as soon as that cumulative sum is
> over a breakpoint value (10).

This was a bit of a challenge because the group definition is IMO a
little strange: The first value *after* the breakpoint is included in
the group.  That made the SQL more convoluted than it would be
otherwise.  

That said, the attached SQL does the job in one statement.  No triggers
are needed.  

$ sqlite3 db < running_sum.sql
TextField   ValueField  CumulativeValue  BreakPoint  GroupName 
----------  ----------  ---------------  ----------  ----------
A           2           2                10          1         
B           3           5                10          1         
C           2           7                10          1         
D           4           11               10          1         
E           5           5                10          2         
F           1           6                10          2         
G           1           7                10          2         
H           5           12               10          2         
I           11          11               10          3         
J           8           8                10          4         
K           2           10               10          4         

This lets you define the results in terms of the base data instead of
keeping a table of derived values.  No updates required, just a view.
Unless you have data-warehouse rowcounts or a very limited machine, it
should perform just fine, despite what one might guess, because the I/O
is limited to one table.  It should also outperform any per-row
solution.  

Two bits of advice if I may.  

Be careful of using triggers to maintain data consistency.  Triggers
are very good for enforcing referential integrity in situations that
DRI can't handle. Beyond that they get very complex and error prone.  

Using triggers to compute derived values signals redundancy in the
database.  That redundancy often is not as desirable as the
denormalize-for-performance crowd thinks.  Better to compute derived
values on demand (as in a view).  If that's demonstrated to be too
slow, a periodic update in a cron job or similar should provide better
thoughput than recomputing on every insert/update/delete.  You'll also
do yourself a favor by segregating derived values in a different table
that can be regenerated at will.  

My other suggestion is to eschew abstract nouns in column names.  Words
like data, field, and value add no meaning.  After all, it's a
database.  If it's not a value, what is it?  

This table, 

CREATE TABLE [Test] 
        ( Name
        , Value
        , RunningSum
        , Breakpoint /* (one word, small 'p') */ DEFAULT 10
        , GroupName
        , primary key (Name)
        ); 

is at least  as clear, wouldn't you say?  

HTH.  

--jkl

P.S. In case the listserv strips attachments, here it is in plain
text.  

[SQL]
select   A.TextField, A.ValueField
       , A.total - coalesce(C.GroupTotal, 0) as CumulativeValue
       , A.Breakpoint
--       , coalesce(C.GroupTotal, 0) as GroupTotal
       , coalesce(B.GroupName, A.GroupName) as GroupName
from (
        select   a.*, sum(b.ValueField) as total
               , 1 + sum(b.ValueField) / (1+a.Breakpoint) as GroupName
        from Test as a join Test as b
        on a.TextField >= b.TextField
        group by a.TextField
                , a.ValueField
                , a.CumulativeValue
                , a.BreakPoint
) as A
left join (
        select   a.*, sum(b.ValueField) as total
               , 1 + sum(b.ValueField) / (1 + a.Breakpoint) as GroupName
        from Test as a join Test as b
        on a.TextField >= b.TextField
        group by a.TextField
                , a.ValueField
                , a.CumulativeValue
                , a.BreakPoint
) as B
on B.TextField = (select max(TextField) from Test where TextField <
A.TextField) left join (
        select    1 + T.GroupName as GroupName
                , max(T.total) as GroupTotal
        from (
                select    A.TextField
                        , A.ValueField
                        , A.total
                        , coalesce(B.GroupName, 1) as GroupName
                from (
                        select   a.*, sum(b.ValueField) as total
                               , 1 + sum(b.ValueField) / (1 +
a.Breakpoint) as GroupName from Test as a join Test as b
                        on a.TextField >= b.TextField
                        group by  a.TextField
                                , a.ValueField
                                , a.CumulativeValue
                                , a.BreakPoint
                                ) as A
                left join (
                        select   a.*, sum(b.ValueField) as total
                               , 1 + sum(b.ValueField) / (1 +
a.Breakpoint) as GroupName from Test as a join Test as b
                        on a.TextField > b.TextField
                        group by  a.TextField
                                , a.ValueField
                                , a.CumulativeValue
                                , a.BreakPoint
                ) as B
                on B.TextField = ( select max(TextField) 
                                   from Test 
                                   where TextField <= A.TextField  )
        ) as T
        group by GroupName
        ) as C
on B.GroupName = C.GroupName
;
[LQS]
> 
> This is an example table:
> 
> CREATE TABLE [Test] (TextField, ValueField, CumulativeValue,
> BreakPoint DEFAULT 10, GroupName);
> insert into [Test] values('A', '2', null, '10');
> insert into [Test] values('B', '3', null, '10');
> insert into [Test] values('C', '2', null, '10');
> insert into [Test] values('D', '4', null, '10');
> insert into [Test] values('E', '5', null, '10');
> insert into [Test] values('F', '1', null, '10');
> insert into [Test] values('G', '1', null, '10');
> insert into [Test] values('H', '5', null, '10');
> insert into [Test] values('I', '11', null, '10');
> insert into [Test] values('J', '8', null, '10');
> insert into [Test] values('K', '2', null, '10');
> 
> I'd like to end up with a table that looks like this:
> 
> ?TextField ValueField CumulativeValue BreakPoint GroupName
> A 2 2 10 1
> B 3 5 10 1
> C 2 7 10 1
> D 4 11 10 1
> E 5 5 10 2
> F 1 6 10 2
> G 1 7 10 2
> H 5 12 10 2
> I 11 11 10 3
> J 8 8 10 4
> K 2 2 10 4
> 
> I spent hours trying to update the CumulativeValue field untill the
> BreakPoint value is crossed, and restarting the cumulative counter,
> but I have too little sql knowledge to do this.
> 
> Could anyone help me?
> 
> thanks
> 
> gert
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Break on cumulative sum

Reply via email to