Hello,
"D. Richard Hipp" <[EMAIL PROTECTED]> 24/06/2004 06:04 AM To: cc: [EMAIL PROTECTED] Subject: Re: [sqlite] database corruption > Michael Robinette wrote: > > ... > You present a new and novel approach to corrupting the database, which > is to combine a database file with a journal from a different database > into the same directory. We'll be thinking about what to prevent this > attack in the 6 days that remain before we freeze the 3.0.0 database > format. This is actually a variant of the method of corrupting the database that fsync()ing the directory containing your journal each commit is designed to solve. An unsynched directory entry may lead to the existence after a power failure of an old journal file, instead of the one that relates to the current database state. Obviously, this variant is a solved problem while others are not. The variant I'm most concerned about is actually a copy operation. User A says to himself "they're just files, I'll copy them onto my backup media". This will often appear to work, so he won't be concerned. One day he restores the files and "weird things" start happening. I'm not sure there's a solution to that, other than user education or an operating-system-level implementation of the journalling itself that treats a copy operation the same as other kinds of database reads. Ultimately the ideal world would have sqlite journalling built into the kernel vfs layer. Hrrmm... I've heard that windows longhorn might incorporate this kind of function. Perhaps we should be pushing for its introduction into other operating systems. It's really very compatible with other file operations where you might want to do operations that ensure readers always see a consistent state of the data. It might also make sqlite just a little touch lighter and more focused. Hrrm. <research-mode> <snippet href="http://www.namesys.com/faq.html"> However, although file data may appear to be consistent from the kernel point of view, since there is no API exported to the userspace to control transactions, we may end-up in a situation where the application makes 2 write requests (as part of one logical transaction) but only one of these gets journaled before the system crashes. From the application point of view, we may then end up with inconsistent data in the file. Such issues should be addressed with the upcoming ReiserFS v.4 release. Such an API will be exported to userspace and all programs that need transactions will be able to use it. </snippet> <snippet href="http://olstrans.sourceforge.net/release/OLS2000-ext3/OLS2000-ext3.html"> [...] one other thing that I want to do is to actually export the nested transaction API into userspace. You have to be very, very careful about that because it's not possible to guarantee proper database semantics. You can't have unbounded, large transactions. You have to have some way in which the user application can get in advance some idea of how many disk blocks it's going to need to modify for the operation, because it's going to call various things like that which are not entirely straight forward; it's not quite as simple as people would hope. But it's sufficiently useful that that will be exported to userspace at some point. </snippet> <snippet href="http://lists.linux-ha.org/pipermail/linux-ha/1999-May/007901.html"> A user-visible transaction API is something entirely different. No way does it belong in the kernel. </snippet> <snippet href="http://seclists.org/lists/linux-kernel/2003/Sep/1364.html"> There will be a new API to support userspace-controlled multifile transactions. At first stab, multifile transactions will be used internally to implement extended attributes. Now, another question is.. will the transaction API support commit() and rollback()? *grin* </snippet> <snippet href="http://www.linuxjournal.com/article.php?sid=4466"> >From time to time, people ask for a version of the transaction API exported to user space. The ReiserFS journal layer was designed to support finite operations that usually complete very quickly, and it would not be a good fit for a general transaction subsystem. It might be a good idea to provide atomic writes to user space, however, and give them more control over grouping operations together. That way an application could request for a 64K file to be created in a certain directory and treat it like an atomic operation. Very little planning has happened in this area thus far. </snippet> <summary> A full transaction API will probably never be exported by the kernel itself, however some basic hooks may eventually be provided if enough people can agree on what those hooks should be. Most of the work would be performed in user-space. </summary> </research-mode> Thoughts: * The breif period sqlite now has an inconsistent state in the main database made the copy scenareo less likely to be a problem, but the problem may occur occasionally. * In an embedded scenareo you can control when writes occur. It's unlikely that a copy operation would occur while a transaction is active. * If we can identify a small number of system-call operations that would need to be supplied by a kernel to implement transactions and fit reasonably with current journalled-filsystem thinking, its possible that it might influence development. The sqlite pager might eventually become an implicit part of libc... * The sqlite pager interface must be nice and clean, with clear operations. Well, that's my tangent email for today. I hope you all enjoyed it :) Benjamin --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]