Huh, fascinating stuff. I'm not an sqlite developer but I can shed light on
some of your questions.

On Tue, 12 Feb 2019 at 09:54, Edwin Török <[email protected]> wrote:

> A very conservative interpretation of various fsync bugs in various OS
> kernels [2][5] would suggest that:
>
> #1. the list of known OS issues [3] should be updated with an entry
> similar to: "Linux kernels <4.13 do not guarantee to report errors
> encountered during writeback on next fsync, therefore data corruption
> can occur without SQLite knowing about it.", see [4]:
>

I haven't read all the links yet so forgive me if this is answered there,
but when you refer to versions <4.13 is that referring to eg. the ext3/ext4
implementation in those versions, or is it a wider problem which affects
_all_ filesystems? (I'm particularly interested in whether lustre is
affected)


> #2. errors not reported on fsync, but on close
>
> According to [5] "A writeback error may not actually be reported by
> fsync(), however; other calls, such as close(), could return it. "
> AFAICT sqlite3 only logs errors from close, and doesn't surface them to
> the caller.
> It is unclear the exact kernel versions that are affected by this, IIUC
> from [2] then >= 4.16 would not be affect Postgresql beacuse it always
> does an fsync before close.
> Does SQLite follow WWPD here, or is it possible that SQLite3 in one
> process calls close without fsync, gets the writeback error reported
> there (which it only logs), and another process calls fsync and gets
> success, wrongly concluding that the data has safely reached the disk?
> (because you only get an error from an inode reported at most once)
>

You're correct that sqlite doesn't relay errors from close() back to the
application. In normal circumstances however, it will always call fsync()
before close(), because an fsync() is involved at the end of each
transaction in sqlite.

The exception is if you've played with the SYNCHRONOUS pragma, in which
case you've explicitly asked sqlite to skip the fsync() and in this
configuration an application can end up writing a corrupt DB without
getting an error from sqlite (I have seen this happen in practice). But I'm
hard pressed to fault sqlite for this behaviour, and the app can workaround
it by resetting the SYNCHRONOUS pragma to the default and committing one
last transaction to trigger fsync() before closing the DB. At least in
theory -- as I said I haven't been through the links so maybe the bugs mean
this workaround isn't reliable :)


> #3 how does this affect multiple processes accessing same sqlite
> database?
> If inode errors are reported at most once, could it be that a writeback
> error from changes made by process A actually get reported to process
> B, and process A never learns about it?
> If process A would always call fsync and close before relinquishing the
> lock on the journal/WAL I think this wouldn't happen, but does SQLite
> guarantee that?
>

sqlite guarantees that only one writer is active at any time. Ie. if
process A is updating the DB via write() calls, the locking protocol
ensures that no other process will call write() or fsync(), and in fact not
even read() (because the DB might be in a partially-updated inconsistent
state) before process A has finalised the transaction and called fsync().

sqlite does not close() the main DB's file descriptor while the application
holds the connection open, only after sqlite3_close() is called.

Thanks for the info,
-Rowan
_______________________________________________
sqlite-users mailing list
[email protected]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to