Re: [sqlite] disk locality

drh Wed, 01 Feb 2006 04:27:17 -0800

Nathaniel Smith <[EMAIL PROTECTED]> wrote:
> I was wondering if there were any docs or explanations available on
> how SQLite decides to lay out data on disk.


Free pages in the middle of the file are filled first.  Some effort
is made to uses pages that are close together for related information.
In mototone, where you seldom if ever delete anything, you probably
never have any free pages, so new information is always added to the
end of the file.

After you VACUUM, everything will be on disk in row order.  If
you see a big performance improvement after VACUUMing, then the
disk layout is perhaps an optimization worth looking into.  If
however (as I suspect) your performance is similar after vacuuming,
then changing the way information is added to the disk probably
will not help, since after a VACUUM the information is pretty much
optimally ordered for minimum seeking.

> 
> In particular, consider the problem of reconstructing bitstrings
> given a bunch of xdelta's (basically one-way patches) and full texts;
> a typical problem would be to calculate a chain of deltas that applied
> one-by-one to a full text will give the desired text, then pull those
> deltas and the full text out of the db and apply them.  We just store
> these text objects as columns in two tables.
> 
> In its current implementation in monotone, this algorithm seems to be
> seek-bound.  Some profiling shows that there are cases where we're
> spending more time blocked waiting for read()s for the db, than it
> takes to read the entire db 5 times over.  This makes sense.  We're
> basically doing random reads scattered all over the file, so the OS
> has no way to do any sort of readahead, which means we're probably
> hitting disk seek latency on every read, and like usual these days,
> latency trumps bandwidth.
> 
> So, the question is how to fix this.

If you can provide specific schemas and query examples and information
on how big the database is, I might could help.  So far, the problem
is too vague for me to give much useful input.

Let me propose a radical solution:  I've been experimenting with adding
a VCS component to CVSTrac (http://www.cvstrac.org/) to replace CVS and
thus provide a complete project management system in a standalone CGI
program.  My latest thinking on this (backed up by experiment) is to
avoid storing long series of xdeltas.  Each file version is instead stored
as a baseline and a single xdelta.  The manifest stores two UUIDs instead
of one.  That way, you can always load a particular file version with
at most two lookups.  As a file evolves, the baseline version stays the
same and the xdelta changes from one version to the next.  When the size
of the xdelta reachs some fraction of the baseline file size, create a
new baseline.  Experimentally, I have found it works well to create a
new baseline when the xdelta becomes 15% of the size of the baseline.

My studies (based on the revision history of SQLite) indicate that using 
a baseline plus single xdelta representation results in a repository that 
is about 2x to 3x larger than using a long change of xdeltas.  But access 
is faster.  And the repository is still 6x to 10x smaller than a git-style
repository that uses no deltas anywhere.

--
D. Richard Hipp   <[EMAIL PROTECTED]>

Re: [sqlite] disk locality

Reply via email to