Hello,
"D. Richard Hipp" <[EMAIL PROTECTED]>
24/06/2004 06:04 AM
To:
cc: [EMAIL PROTECTED]
Subject: Re: [sqlite] database corruption
> Michael Robinette wrote:
> > ...
> You present a new and novel approach to corrupting the database, which
> is to combine a database file with a journal from a different database
> into the same directory. We'll be thinking about what to prevent this
> attack in the 6 days that remain before we freeze the 3.0.0 database
> format.
This is actually a variant of the method of corrupting the database that
fsync()ing the directory containing your journal each commit is designed
to solve. An unsynched directory entry may lead to the existence after a
power failure of an old journal file, instead of the one that relates to
the current database state. Obviously, this variant is a solved problem
while others are not.
The variant I'm most concerned about is actually a copy operation. User A
says to himself "they're just files, I'll copy them onto my backup media".
This will often appear to work, so he won't be concerned. One day he
restores the files and "weird things" start happening.
I'm not sure there's a solution to that, other than user education or an
operating-system-level implementation of the journalling itself that
treats a copy operation the same as other kinds of database reads.
Ultimately the ideal world would have sqlite journalling built into the
kernel vfs layer. Hrrmm... I've heard that windows longhorn might
incorporate this kind of function. Perhaps we should be pushing for its
introduction into other operating systems. It's really very compatible
with other file operations where you might want to do operations that
ensure readers always see a consistent state of the data. It might also
make sqlite just a little touch lighter and more focused.
Hrrm.
<research-mode>
<snippet href="http://www.namesys.com/faq.html">
However, although file data may appear to be consistent from the kernel
point of view, since there is no API exported to the userspace to control
transactions, we may end-up in a situation where the application makes 2
write requests (as part of one logical transaction) but only one of these
gets journaled before the system crashes. From the application point of
view, we may then end up with inconsistent data in the file.
Such issues should be addressed with the upcoming ReiserFS v.4 release.
Such an API will be exported to userspace and all programs that need
transactions will be able to use it.
</snippet>
<snippet
href="http://olstrans.sourceforge.net/release/OLS2000-ext3/OLS2000-ext3.html">
[...] one other thing that I want to do is to actually export the nested
transaction API into userspace. You have to be very, very careful about
that because it's not possible to guarantee proper database semantics. You
can't have unbounded, large transactions. You have to have some way in
which the user application can get in advance some idea of how many disk
blocks it's going to need to modify for the operation, because it's going
to call various things like that which are not entirely straight forward;
it's not quite as simple as people would hope. But it's sufficiently
useful that that will be exported to userspace at some point.
</snippet>
<snippet
href="http://lists.linux-ha.org/pipermail/linux-ha/1999-May/007901.html">
A user-visible transaction API is something entirely different. No way
does it belong in the kernel.
</snippet>
<snippet href="http://seclists.org/lists/linux-kernel/2003/Sep/1364.html">
There will be a new API to support userspace-controlled
multifile transactions.
At first stab, multifile transactions will be used internally to
implement extended attributes.
Now, another question is.. will the transaction API support commit() and
rollback()? *grin*
</snippet>
<snippet href="http://www.linuxjournal.com/article.php?sid=4466">
>From time to time, people ask for a version of the transaction API
exported to user space. The ReiserFS journal layer was designed to support
finite operations that usually complete very quickly, and it would not be
a good fit for a general transaction subsystem. It might be a good idea to
provide atomic writes to user space, however, and give them more control
over grouping operations together. That way an application could request
for a 64K file to be created in a certain directory and treat it like an
atomic operation. Very little planning has happened in this area thus far.
</snippet>
<summary>
A full transaction API will probably never be exported by
the kernel itself, however some basic hooks may eventually be provided if
enough people can agree on what those hooks should be. Most of the work
would be performed in user-space.
</summary>
</research-mode>
Thoughts:
* The breif period sqlite now has an inconsistent state in the main
database made the copy scenareo less likely to be a problem, but the
problem may occur occasionally.
* In an embedded scenareo you can control when writes occur. It's unlikely
that a copy operation would occur while a transaction is active.
* If we can identify a small number of system-call operations that would
need to be supplied by a kernel to implement transactions and fit
reasonably with current journalled-filsystem thinking, its possible that
it might influence development. The sqlite pager might eventually become
an implicit part of libc...
* The sqlite pager interface must be nice and clean, with clear
operations.
Well, that's my tangent email for today. I hope you all enjoyed it :)
Benjamin
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]