Re: [sqlite] database corruption

ben . carlyle Thu, 24 Jun 2004 00:58:57 -0700

Hello,

"D. Richard Hipp" <[EMAIL PROTECTED]>
24/06/2004 06:04 AM

        To: 
        cc:     [EMAIL PROTECTED]
        Subject:        Re: [sqlite] database corruption

> Michael Robinette wrote:
> > ...

> You present a new and novel approach to corrupting the database, which
> is to combine a database file with a journal from a different database
> into the same directory.  We'll be thinking about what to prevent this
> attack in the 6 days that remain before we freeze the 3.0.0 database
> format.

This is actually a variant of the method of corrupting the database that 
fsync()ing the directory containing your journal each commit is designed 
to solve. An unsynched directory entry may lead to the existence after a 
power failure of an old journal file, instead of the one that relates to 
the current database state. Obviously, this variant is a solved problem 
while others are not.

The variant I'm most concerned about is actually a copy operation. User A 
says to himself "they're just files, I'll copy them onto my backup media". 
This will often appear to work, so he won't be concerned. One day he 
restores the files and "weird things" start happening.

I'm not sure there's a solution to that, other than user education or an 
operating-system-level implementation of the journalling itself that 
treats a copy operation the same as other kinds of database reads. 
Ultimately the ideal world would have sqlite journalling built into the 
kernel vfs layer. Hrrmm... I've heard that windows longhorn might 
incorporate this kind of function. Perhaps we should be pushing for its 
introduction into other operating systems. It's really very compatible 
with other file operations where you might want to do operations that 
ensure readers always see a consistent state of the data. It might also 
make sqlite just a little touch lighter and more focused.

Hrrm.
<research-mode>
        <snippet href="http://www.namesys.com/faq.html";>
However, although file data may appear to be consistent from the kernel 
point of view, since there is no API exported to the userspace to control 
transactions, we may end-up in a situation where the application makes 2 
write requests (as part of one logical transaction) but only one of these 
gets journaled before the system crashes. From the application point of 
view, we may then end up with inconsistent data in the file. 
Such issues should be addressed with the upcoming ReiserFS v.4 release. 
Such an API will be exported to userspace and all programs that need 
transactions will be able to use it. 
        </snippet>
        <snippet 
href="http://olstrans.sourceforge.net/release/OLS2000-ext3/OLS2000-ext3.html";>
[...] one other thing that I want to do is to actually export the nested 
transaction API into userspace. You have to be very, very careful about 
that because it's not possible to guarantee proper database semantics. You 
can't have unbounded, large transactions. You have to have some way in 
which the user application can get in advance some idea of how many disk 
blocks it's going to need to modify for the operation, because it's going 
to call various things like that which are not entirely straight forward; 
it's not quite as simple as people would hope. But it's sufficiently 
useful that that will be exported to userspace at some point.
        </snippet>
        <snippet 
href="http://lists.linux-ha.org/pipermail/linux-ha/1999-May/007901.html";>
A user-visible transaction API is something entirely different.  No way
does it belong in the kernel.
        </snippet>
        <snippet href="http://seclists.org/lists/linux-kernel/2003/Sep/1364.html";>

There will be a new API to support userspace-controlled 
multifile transactions. 
At first stab, multifile transactions will be used internally to 
implement extended attributes. 
Now, another question is.. will the transaction API support commit() and 
rollback()? *grin* 

        </snippet>
        <snippet href="http://www.linuxjournal.com/article.php?sid=4466";>

>From time to time, people ask for a version of the transaction API 
exported to user space. The ReiserFS journal layer was designed to support 
finite operations that usually complete very quickly, and it would not be 
a good fit for a general transaction subsystem. It might be a good idea to 
provide atomic writes to user space, however, and give them more control 
over grouping operations together. That way an application could request 
for a 64K file to be created in a certain directory and treat it like an 
atomic operation. Very little planning has happened in this area thus far. 

        </snippet>
        <summary>
                 A full transaction API will probably never be exported by 
the kernel itself, however some basic hooks may eventually be provided if 
enough people can agree on what those hooks should be. Most of the work 
would be performed in user-space.
        </summary>
</research-mode>

Thoughts:
* The breif period sqlite now has an inconsistent state in the main 
database made the copy scenareo less likely to be a problem, but the 
problem may occur occasionally.
* In an embedded scenareo you can control when writes occur. It's unlikely 
that a copy operation would occur while a transaction is active.
* If we can identify a small number of system-call operations that would 
need to be supplied by a kernel to implement transactions and fit 
reasonably with current journalled-filsystem thinking, its possible that 
it might influence development. The sqlite pager might eventually become 
an implicit part of libc...
* The sqlite pager interface must be nice and clean, with clear 
operations.

Well, that's my tangent email for today. I hope you all enjoyed it :)

Benjamin

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [sqlite] database corruption

Reply via email to