Re: [sqlite] Performance problems potentially related to index scalability

Joe Wilson Fri, 09 Feb 2007 19:40:37 -0800

> I understand that I'm experiencing thrashing, but I'm curious as to why I'm
> experiencing it. My understanding is that the row_id PRIMARY KEY column has
> an index on it by default. Inserting into this index doesn't cause
> thrashing, but inserting into the secondary index does. I hoped that
> enabling the shared cache mode and periodically doing the select operation
> on it would keep it in SQLITE's cache. Based on what I saw from I/O stat,
> top, and my timing measurements, that doesn't seem to be the case. In fact,
> my cache "priming" experiment showed that it didn't help at all,  despite a
> cache size of 500 MB and the index being on a column with an INTEGER pragma.
> The transaction immediately following the cache priming would be just as
> slow as the transactions far away from it. Why would a transaction inserting
> 1024 rows of  0.5 KB each result in the eviction of that index, despite the
> abundance of cache?


The OS disk cache is not effective because the index fragments are currently 
scattered across all pages of the table it is indexing. Even VACUUM (as it 
is currently written in sqlite/src/vacuum.c) won't help such indexes be 
contiguous in the file.

The following patch to VACUUM allows pages of non-inline indexes explicitly 
made with CREATE INDEX to be contiguous in the file, which can improve VACUUM
times and non-inline index locality of reference somewhat:

 http://www.sqlite.org/cvstrac/tktview?tn=2075

It works by moving the index creation after the table row inserts (as 
one might do manually) so the index pages are not intermixed with table 
pages.

Unfortunately this patch will not aid inline indexes made within the 
CREATE TABLE statement (i.e., UNIQUE). Nor will this patch help the 
situation when rows are being first inserted into a table (prior to 
VACUUM). So it won't help table population speed at all.

> The solution you propose doesn't work when your table has multiple indices
> of this type. One can imagine altering the schema as to store each indexed
> column in a separate table, with a column containing the rowid to the main
> table, but that would result in a significant amount of redundant overhead.

Don't discount this workaround without first doing timings and database 
size experiments.

If SQLite could be persuaded to reserve contiguous blocks of pages in advance 
for exclusive use for each index, insert times could improve for table with 
more than one index. The database file would be larger while the tables are 
being populated, but it could be made to shrink to a smaller size with an 
explicit VACUUM. I think this time/space trade-off would be acceptable to most 
users. This could be controlled via a PRAGMA if you want the classic paging
behavior.


 
____________________________________________________________________________________
Do you Yahoo!?
Everyone is raving about the all-new Yahoo! Mail beta.
http://new.mail.yahoo.com

-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------

Re: [sqlite] Performance problems potentially related to index scalability

Reply via email to