Re: [sqlite] Conflict between snapshots and checkpoints

Keith Medcalf Sun, 29 Sep 2019 05:14:08 -0700

On Sunday, 29 September, 2019 01:28, Gwendal Roué <gwendal.r...@gmail.com> 
wrote:


>But now I fail to understand the indented use case of sqlite3 snapshots..
>Why allow to reuse snapshots with several calls to open()? Why do they
>exist at all, since we can already profit from snapshot isolation with
>one transaction (at the cost of keeping it open)?

The sqlite3_snapshot_* interfaces allow you to record information about 
transactions maintained by the WAL transaction manager.  The interface DOES NOT 
manage the WAL transaction manager nor does it manage your so-called snapshots 
(which are not snapshots).  It allows you to "get" information about a 
particular point-in-time location in the WAL transaction log and "goto" a 
specific point-in-time in the WAL transaction log, to "free" the memory used to 
record this information, and to "cmp" which of two structures point to a later 
point-in-time in the WAL transaction log.

They do NOT modify or control the WAL transaction system or the WAL file.  You 
are responsible for doing this.

https://sqlite.org/wal.html

So, *IF* you want to maintain the availability of a specific point in time in 
the WAL transaction history, they YOU must make sure that YOU do not erase that 
particular point-in-time location by checkpointing the WAL transaction log.  
You have the capability of doing this by disabling the autocheckpoint mechanism 
and controlling when you do checkpoints.  The WAL transaction system neither 
knows nor cares that you happen to have recorded some information about a 
particular point-in-time offset in the WAL file.  

It is sort of like "No Smoking" signs.  The sign does not mean that you cannot 
smoke.  It means that someone posted a sign that says "No Smoking".  The fact 
that there is a sign and what it says is an entirely separate fact that has no 
bearing at all on whether or not one may smoke, nor does the sign somehow 
magically prevent smoking, nor will it prevent you from catching on fire and 
smoking as a result thereof.  It is merely a sign that has something written on 
it which someone stuck up on the wall (or whatever support structure it is 
stuck to).  And what is written on the sign is entirely independent of any 
other state of affairs.

>For information, my quest for snapshot protection has met some success (
>https://github.com/groue/GRDB.swift/pull/625)

This appears to be updating a wrapper interface to SQLite3 so that if you 
happen to have a point-in-time location recorded, then the automatic WAL 
checkpoint system is disabled, plus doing some prevention to protect users of 
the wrapper from modifying (checkpointing) the WAL transaction log if they 
happen to have some recorded "points-in-time" that are dependent on not 
modifying the WAL transaction log.  (ie, things that you should be doing 
yourself so as to not shoot yourself in the foot).

>Given I control the connections to a given database file, I am able to
>open one writer connection, and, on demand, up to N read-only connections.

>Those N reader connections allow concurrent database reads. Those "reads" are
>generally wrapped in a deferred transaction which provides snapshot
>isolation.   

No, it provides REPEATABLE-READ isolation.  There is no actual "snapshot" taken 
and no snapshot exists.  You are merely not seeing data written to the WAL 
transaction log at a point-in-time subsequent to the point in time at which you 
commenced the "repeatable-read".

The sqlite3_snapshot_* interface merely provides a way to record information 
about this repeatable-read point-in-time position in the WAL file so that you 
can go back to that "repeatable-read" point-in-time location sometime in the 
future IF IT STILL EXISTS in the WAL transaction log.  Since YOU are in control 
of whether or not the WAL file is checkpointed, then YOU are in control of 
whether this point-in-time still exists in the WAL transaction log.  If you 
remove data from the WAL transaction log by performing a checkpoint (which 
moves transactions into the main database file) then this recorded 
"point-in-time" may no longer exist in the WAL transaction log.

>At the end of this transaction, the reader connection becomes
>available for another read.

Yes.  And by default when the transaction starts it gets a repeatable-read 
isolation "point-in-time" stamp corresponding to the last transaction written 
to the WAL transaction log which has been committed, or gets the current 
point-in-time representing the "current database" if there are no committed 
transactions in the WAL transaction log.

>One can now create a "snapshot". Those snapshots use the same pool of N
>readers: snapshot reads are then wrapped in a deferred transaction and
>sqlite3_snapshot_open. At the end of this transaction, the reader
>connection becomes available for another regular read or for another
>snapshot read.

There is the error.  There is no "snapshot".  You do not "create a snapshot".  
You record a point-in-time marker location within the WAL transaction log.  
Think of it as taking a picture of the state of the moving van after it has 
been half filled.  You can go back to the point-in-time represented by this 
picture right up until the moving van drives to the new location and unloads 
those boxes at your new house (does a checkpoint).  When this happens, you 
still have the picture, but now a new van has arrived to be loaded up and the 
old one is gone forever.  While you still might have the picture, it is just a 
bunch of data that is not useful for anything except entertainment.

>In order to protect unused snapshots, checkpoints are prevented as long
>as there exists snapshots, with sqlite3_wal_hook().

There are no "snapshots".  There are merely recorded point-in-time markers from 
the WAL transaction log.  If you modify the WAL transaction log by performing a 
checkpoint operation, your "point-in-time" may no longer be accessible because 
that "point-in-time marker" may no longer exist in the WAL transaction log.

>I **really** hope this protects snapshots for good. Of course, a
>confirmation from knowledgeable people would be appreciated :-)

Yes.  If you do not remove transactions from the WAL transaction log then all 
the point-in-time markers before each of these transactions will still exist.  
As soon as you move a committed transaction from the WAL file to the main 
database (this is called a checkpoint), the "point-in-time" marker prior to 
that transaction no longer exists in the WAL file, so you "cannot get there 
from here" (you cannot go to that point-in-time location because it is no 
longer recorded in the transaction log).

-- 
The fact that there's a Highway to Hell but only a Stairway to Heaven says a 
lot about anticipated traffic volume.



_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Conflict between snapshots and checkpoints

Reply via email to