Hi Rowan!

>> Shutting down power right after a successfully committed
>> transaction rolls back that transaction on next startup.
> 
> nitpick: This is sqlite behaving as advertised. See
> https://www.sqlite.org/lockingv3.html section 5.0 step 6, and
> https://www.sqlite.org/atomiccommit.html section 3.11 which explain that
> the presence of a [well-formed] journal file is the mechanism via which
> sqlite discriminates between a committed and in-progress transaction.
> 
> ie. according to sqlite a transaction is *not* successfully committed if
> the journal file is still present, so its well within its rights to
> rollback in this scenario.

Yes, a hot journal file means an incomplete transaction and should
absolutly roll back. What actually bugs me is that this happens after
COMMIT has returned without an error.

> That said, syncing the directory doesn't sound like a terrible idea. But
> it's not clear to me that the cost of another sync every transaction is
> worth a marginal reduction in the power-failure-leads-to-rollback window.

The actual costs are not clear to me either. I hope that someone on the
list with more experience in that field knows more.

> That's if it even reduces the window; it wouldn't surprise me to find that
> the journal is removed from disk after the same delay both with and without
> dirsync, the difference being that the dirsync prevents sqlite from
> returning control to your code until its certain the commit has persisted.

The filesystem used (QNX6) is advertised as performing a full sync on
every fsync and automatically some delay time after each write (10s),
and it looks like it is really doing so. If it wouldn't sync on fsync,
or if fsync would just wait until the automatic sync is through, I would
surely open a ticket with QNX and have some stern words with their resident.

That said I would expect the dirsync to shorten the window of
unnecessary rollback by an average of 5s in our case, which may or may
not be relevant to other users. On other filesystems with longer times
until it syncs anyway (I remember 30s from my first Linux box) the
reduction might be more substantial.

> There's certainly a surprising result here:
> 
>     if (sqlite3_exec("COMMIT") == SQLITE_OK) {
>         /* post-commit logic */
>     }
> 
> Logically, the code in the if block can reasonably assume that the
> transaction will not rollback.

And that's where the potential for a tradeoff lies. Should SQLite
guarantee that in post-commit the transaction is done, durably, even for
the cost of the additional? Or should it not guarantee durability an be
a bit (or much, I don't know) faster?

Personally I tend to assume a database is made for guaranteeing ACID
properties, except when the user explicitly decides otherwise. So in my
eyes an fsync plus possibly an #ifdef for explicitly disabling it would
be the cleanest thing. OTOH it might surprise users that are used to the
higher performance/lower load and don't care for durability, so an
#ifdef for explicitly enabling the fsync instead might still be a good
compromise.

> But as you have discovered this is not
> always true with JOURNAL_MODE=DELETE unless the dirsync is performed. To be
> fair I don't think there are many power-failure scenarios where the
> post-commit logic would have a chance to do anything significant, so the
> incorrect assumption will usually be moot.

Yes, in a normal power failure you ?only? win more time, a few seconds
or more, depending on the filesystem's details.

BTW., as you have mentioned JOURNAL_MODE=DELETE?I have not examined what
happens in the other journal modes, i.e. I'm not sure whether
invalidating the journal by other means has the same lack of sync or
not. A proper patch might need to involve the other modes, too.

> In your case it sounds like a controlled shutdown - is there a reason you
> don't do a full disk sync before that?

Yes, it is a controlled shutdown, so in my case the /* post-commit logic
*/ basically pulls the plug.

Trouble is that I only control the database, not the shutdown procedure
(this is a commercial product with several hundred people working on
different aspects of the system). So while I can try to ask the shutdown
crew to sync like any real computer would do, I ultimately have no
saying in that but still need to ensure survival of data.

Without the patch (which I do have applied locally, of course), me and
everyone with a similar usecase get into trouble for relying on the
quite excellent reputation of SQLite. Actually I first fingerpointed to
the flash hardware or its driver, because ?SQLite is well-tested and
doesn't have this kind of bugs? :-)

S.M.
-- 
Dipl.-Phys. (Univ) Stefan Meinlschmidt, Senior Software Engineer
Am Wolfsmantel 46, 91058 Tennenlohe, Germany
Tel: +49-8458-3332-531  stefan.meinlschmidt at esolutions.de
Fax: +49-8458-3332-20-531

Reply via email to