Hi Rowan! >> Shutting down power right after a successfully committed >> transaction rolls back that transaction on next startup. > > nitpick: This is sqlite behaving as advertised. See > https://www.sqlite.org/lockingv3.html section 5.0 step 6, and > https://www.sqlite.org/atomiccommit.html section 3.11 which explain that > the presence of a [well-formed] journal file is the mechanism via which > sqlite discriminates between a committed and in-progress transaction. > > ie. according to sqlite a transaction is *not* successfully committed if > the journal file is still present, so its well within its rights to > rollback in this scenario.
Yes, a hot journal file means an incomplete transaction and should absolutly roll back. What actually bugs me is that this happens after COMMIT has returned without an error. > That said, syncing the directory doesn't sound like a terrible idea. But > it's not clear to me that the cost of another sync every transaction is > worth a marginal reduction in the power-failure-leads-to-rollback window. The actual costs are not clear to me either. I hope that someone on the list with more experience in that field knows more. > That's if it even reduces the window; it wouldn't surprise me to find that > the journal is removed from disk after the same delay both with and without > dirsync, the difference being that the dirsync prevents sqlite from > returning control to your code until its certain the commit has persisted. The filesystem used (QNX6) is advertised as performing a full sync on every fsync and automatically some delay time after each write (10s), and it looks like it is really doing so. If it wouldn't sync on fsync, or if fsync would just wait until the automatic sync is through, I would surely open a ticket with QNX and have some stern words with their resident. That said I would expect the dirsync to shorten the window of unnecessary rollback by an average of 5s in our case, which may or may not be relevant to other users. On other filesystems with longer times until it syncs anyway (I remember 30s from my first Linux box) the reduction might be more substantial. > There's certainly a surprising result here: > > if (sqlite3_exec("COMMIT") == SQLITE_OK) { > /* post-commit logic */ > } > > Logically, the code in the if block can reasonably assume that the > transaction will not rollback. And that's where the potential for a tradeoff lies. Should SQLite guarantee that in post-commit the transaction is done, durably, even for the cost of the additional? Or should it not guarantee durability an be a bit (or much, I don't know) faster? Personally I tend to assume a database is made for guaranteeing ACID properties, except when the user explicitly decides otherwise. So in my eyes an fsync plus possibly an #ifdef for explicitly disabling it would be the cleanest thing. OTOH it might surprise users that are used to the higher performance/lower load and don't care for durability, so an #ifdef for explicitly enabling the fsync instead might still be a good compromise. > But as you have discovered this is not > always true with JOURNAL_MODE=DELETE unless the dirsync is performed. To be > fair I don't think there are many power-failure scenarios where the > post-commit logic would have a chance to do anything significant, so the > incorrect assumption will usually be moot. Yes, in a normal power failure you ?only? win more time, a few seconds or more, depending on the filesystem's details. BTW., as you have mentioned JOURNAL_MODE=DELETE?I have not examined what happens in the other journal modes, i.e. I'm not sure whether invalidating the journal by other means has the same lack of sync or not. A proper patch might need to involve the other modes, too. > In your case it sounds like a controlled shutdown - is there a reason you > don't do a full disk sync before that? Yes, it is a controlled shutdown, so in my case the /* post-commit logic */ basically pulls the plug. Trouble is that I only control the database, not the shutdown procedure (this is a commercial product with several hundred people working on different aspects of the system). So while I can try to ask the shutdown crew to sync like any real computer would do, I ultimately have no saying in that but still need to ensure survival of data. Without the patch (which I do have applied locally, of course), me and everyone with a similar usecase get into trouble for relying on the quite excellent reputation of SQLite. Actually I first fingerpointed to the flash hardware or its driver, because ?SQLite is well-tested and doesn't have this kind of bugs? :-) S.M. -- Dipl.-Phys. (Univ) Stefan Meinlschmidt, Senior Software Engineer Am Wolfsmantel 46, 91058 Tennenlohe, Germany Tel: +49-8458-3332-531 stefan.meinlschmidt at esolutions.de Fax: +49-8458-3332-20-531