Re: [sqlite] rollback/replay journals and durability of the most recent transaction

Karl Tomlinson Wed, 02 Jul 2008 22:38:29 -0700

Igor Tandetnik writes:

> Karl Tomlinson wrote:
>
>> I really meant, as a first possibility, that writing the pages
>> to the database itself would be performed during the commit
>> process after syncing the replay journal (but the database need
>> not be synced until the journal is about to be removed).
>
> So if I insert large amounts of data, this data would be written
> to disk twice: once into a replay journal, and then again into
> the database file on commit.

Right.

> With rollback journal, it is written once into the database
> (with rollback journal merely noting which new pages have been
> created), and commit is nearly instantaneous.

Thanks.  That is an advantage of the rollback journal approach,
which could be twice as fast when limited by async IO bandwidth.
(But if the rollback is really instantaneous, the replay would
only take two instants.)

On my initial reading of "writes into the rollback journal the
original content of the database pages that are to be altered" at
http://www.sqlite.org/atomiccommit.html I assumed the advantage
would swing the other way on deletion of pages, but of course the
deleted pages need not be completely erased, and so their content
need not be recorded in the rollback journal.

Perhaps, the rollback journal could also have this advantage even
on page modifications when the complete page is being written.  By
writing to a new page instead of over an old page (and not
removing the data from the old page until a later transaction),
the rollback journal need not record the content of the old page.

Note that, if there are situations where the original content of a
page needs to be recorded in the rollback journal, then there may
be an advantage in the replay journal approach, as it knows the
data that it needs to write without needing to read the old
content.

> I'd rather optimize for the common case (transactions
> successfully committed) than the exceptional one.

I'm not actually trying to optimize for the filesystem
interruption case.

I'm trying to optimize for a situation where IO bandwidth is not
high and so async IO is cheap, but there are many small
transactions and the filesystem is shared by many other apps and
so fsyncs can be very expensive.

There will be different "best" solutions for different situations,
but maybe it is possible to disable sqlite's standard journaling
and use the vfs sqlite3_io_methods to implement a virtual
filesystem with its own customized (low-level) journaling system.

Ken writes:

> Ok, I'll argue why write the entire page, why not just write
> what changed on the page? Allowing more information to be
> written to a redo journal (ie more than one modification) per
> redo page ????

Potentially a good optimization, thank you.

> How often does a write actually modify the entire page?

I don't know the answer to this question.

Igor Tandetnik writes:

>>> SQLite would have to keep track of where in the replay journal each
>>> new page was written, so that if it needed to reload that page for
>>> any reason, it could.  As long as the transaction (and thus the
>>> replay journal) is not too big, this is not a problem.  A hash table
>>> in memory would suffice.  But sometimes transactions do get very
>>> large. For example, with the VACUUM command.  And in those cases,
>>> the amount of memory needed to record the mapping from pages to
>>> offsets into the replay journal could become too large to fit into
>>> memory, especially in memory-constrained devices such as cellphones.
>>
>> The size of the transaction would be something to consider in the
>> decision on when to sync the journal and write its pages to the
>> database.
>
> So presumably, as the transaction grows, at some point you may decide to 
> dump the replay journal into the database file even while the 
> transaction is still open. What if the user decides to roll it back soon 
> afterwards? Wouldn't you need both a replay and a rollback journals 
> then?

Ken writes:

> Yes you would, but then again you'd also want to put the changes
> for the undo journal into the redo journal so that if the "DB"
> crashed it would be able to recover not just the data files but
> the undo as well.

I didn't consider uncommitted transactions being written to the
database until synced to the replay journal.

I imagined that if the transaction got too large to keep in
memory, then it would need to be reread from the journal when it
is written to the database.  This would of course impact
performance, so this is another situation where the rollback
implementation for cache spill would perform better.
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] rollback/replay journals and durability of the most recent transaction

Reply via email to