On Mar 3, 2009, at 8:01 PM, Trainor, Chris wrote:

> I'm not sure how much we can do about preventing adds and deletes.   
> It *may* be possible to replace them with updates, but I am not sure  
> yet.  These adds and deletes are happening in a different table than  
> the one being summed.  This other table contains a large blob  
> column.  Would changing to updates help or will updates fragment the  
> database as much as adds and deletes?

SQLite implements an UPDATE by first deleting the old row then  
inserting a new one in its place.  So I don't think changing DELETE/ 
INSERT pairs into UPDATEs will help much with fragmentation.  And,  
besides, deleting and inserting does not really cause much  
fragmentation, as long as the data inserted is roughly the same size  
as the data deleted.

Fragmentation occurs for many reasons, but one important reason is  
that two or more b-trees within the database file are growing at the  
same time.  As each b-tree grows, it needs to allocate new pages.  New  
pages are allocated from the end of the database file (unless there  
were previously deleted pages that can be reused).  Imagine that you  
have (say) 10 b-trees all growing at roughly the same rate.  As the b- 
trees all grow, they will each allocates pages off the end of the file  
as they need time.  And you will end up with pages of the 10 b-tress  
all interleaved rather than being bunched together.

Note that there is one b-tree for each table and for each index.  So  
if you have a single SQL table with 3 unique columns (there is one  
implied index for each UNIQUE constraint) and 2 explicit indices, you  
will have 1+3+2=6 b-trees.  As you insert new information into this  
table, all 6 b-trees are updated together, so there iwill be some  
interleaving and hence fragmentation.

When you run the VACUUM command, it rebuilds each b-tree one by one,  
so all the pages for a single b-tree are bunched together in the file.

Note that using auto_vacuum does *not* help with fragmentation.  In  
fact, auto_vacuum makes fragmentation worse.  Auto_vacuum is designed  
for used on small flash-memory drives (such as found on cell-phones)  
that have low capacity and zero seek latency.  Auto_vacuum is a very  
helpful feature for the right problem, but fragmentation is not the  
right problem.

When there are free pages in the database file and new pages are  
needed by a growing b-tree, an attempt is made to reuse free pages  
that are as close as possible to the rest of the b-tree.  But  
typically the free list is short and the choices are limited, so it  
does not often happen that the chosen free page is immediately  
adjacent to the growing b-tree.

Decades of experience with filesystems have taught us that various  
heuristics can prevent filesystem fragmentation, as long as the  
filesystem is less than about 80% or 90% full.  Once a filesystem gets  
close to being full, fragmentation is inevitable.  To transfer this  
experience to SQLite, recognize that SQLite attempts to keep its  
database file as small as possible.  In other words, SQLite tries to  
keep itself 100% full at all times.  Hence, fragmentation of data in  
SQLite is pretty much inevitable.  One could envision future versions  
of SQLite that allowed you to preallocate a large database files such  
that the database always stayed less than 80% full.  Then we could use  
filesystem techniques to keep fragmentation down.  The penalty, of  
course, is that your database file is larger.  Probably much larger.   
And just to be clear: SQLite does not have that capability at this time.

>
>
> The second option is the one I am considering.  It looks like there  
> might be a good time to run vacuum.  I need to do some more timings  
> to tell for sure.

Do not be tempted by the incremental vacuum feature.  Incremental  
vacuum will reduce the database size as content is deleted, but it  
will not reduce fragmentation.  In fact, incremental vacuum will  
likely increase fragmentation.  Incremental vacuum is just a variation  
on auto_vacuum.  It is designed for flash memory with zero seek latency.

D. Richard Hipp
d...@hwaci.com



_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to