On 16 Apr 2012, at 11:25pm, Puneet Kishor <[email protected]> wrote:

> I absolutely don't get any of the above. Why is "keeping the data" worse than 
> keeping the commands? I am not even sure what is a command vs. what is a 
> data. A SQL command has data embedded in it. I mean, if I have
> 
>       UPDATE t 
>       SET name = 'foo', val = 3.1415
>       WHERE id = 22;
> 
> which part was the command and which part was the data?

The command is the three rows above.  Imagine you'd passed that command to 
_exec().  So as well as executing that command you write it to a file somewhere:

logid   timestamp       command
163742  2347634.133     UPDATE t SET name = 'foo', val = 3.1415 WHERE id = 22;

In terms of the data, I'm not sure whether you're keeping copies of the entire 
row, or just triplets.  If you're keeping a copy of the row every time the row 
changes then you would need to store

logid   timestamp       tablename       id      name    val     col3    col4    
col5    ...
213876  2347634.133     t               22      foo     3.1415  6       fred    
x       ...

if instead you are storing triplets then you would need to store two rows of 
data

logid   timestamp       tablename       id      column  value
824724  2347634.133     t               22      name    foo
824725  2347634.133     t               22      val     3.1415

Obviously I just made up the column names off the top of my head, and I didn't 
invent a mechanism for noting DELETE.

On 16 Apr 2012, at 11:10pm, Nico Williams <[email protected]> wrote:

> The nice thing about having all historical and current and future data
> in one table is that you can:
> 
> a) trivially review the past,
> b) trivially create future changes that become effective as time passes.

I agree that this has its advantages.  I suppose it comes down to what kind of 
use you're going to make of the data.

>> Whether you are keeping copies of the rows in the table, or timestamping SQL 
>> commands, I suggest that for SQLite your timestamps should be unixepoch 
>> stored as a REAL rather than a text expression of seconds.
> 
> Why REAL instead of INTEGER?

Because with sufficient resolution they are unique, which gives you an 
understanding of which change was made before which other change.  If you use 
just integer seconds you can be left with two changes with the same timestamp.



On 16 Apr 2012, at 11:15pm, Petite Abeille <[email protected]> wrote:

> But, in practice, how would one use such DML logs? Say someone got a small 
> million rows, with over a period of time where subjected to 10x that many 
> DMLs (update, insert, delete). How would one now practically use these DML 
> logs to query data at two different point in times? Replay all the logs from 
> the beginning each and every time?

Good question.  You can take periodical snapshots of your entire database, and 
log those together with your log of changes.  So to restore your data as of 
time T, you would

1) restore the latest snapshot made before time T
2) then replay all commands issued after that snapshot but before time T.

Now, consider what it takes if you're logging value changes instead of commands 
issued.  Your first problem is figuring out which rows exist.  Are you storing 
triplets, or complete rows ?  What SELECTs are you going to do ?

Simon.
_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to