[sqlite] a string indexing experiment

Arun Bhalla Wed, 12 Apr 2006 17:01:26 -0700

Hi,

I performed a quick benchmark of three different string indexing schemesfor SQLite3.


 * Scheme 0 = indexing on the string field

* Scheme 1 = indexing on the MD5 sum (as text in hexadecimalrepresentation) of the string* Scheme 2 = indexing on the high 64 bits of the MD5 sum (as int) ofthe string

I varied string size and number of strings and evaluated the schemes ondatabase size and a couple insertion and retrieval tests each. Ingeneral, scheme 2 was quite effective for most cases. Scheme 0 was thebest all-around for short strings (16 bytes or less), but in most cases,scheme 2 was not far behind. When working with larger strings, scheme 2would dominate, with scheme 1 generally having similar performance.SQLite's indexing mechanism (scheme 0) did not scale well in size orperformance for large strings.

Strangely, scheme 1 always outperformed scheme 2 in the pure retrievaltest, but even when scheme 1 was 50-200% faster than scheme 2, scheme 0was an order of magnitude slower. Scheme 2 was faster for uniqueinsertion, though, so either scheme 1 or scheme 2 could be usefuldepending upon the usage model.

Is there a good reason why retrieval with scheme 1 would be faster thanretrieval with scheme 2? Scheme 1 involves an index on 32 bytes whilescheme 2 involves an index on 8 bytes. I would think that scheme 2would always be faster than scheme 1 because fewer bytes are involved.Does SQLite index INT fields differently than TEXT fields?

Some background information, if necessary: the benchmark program waswritten in C/C++ and run on a P4 (x86) Linux box. Inserts wereperformed in transactions of 1000 each; pure retrievals were not groupedtogether in transactions. String sizes ranged from 16-1048576 bytes;number of rows in a table ranged from 10K to 10M. Synchronous writeswere disabled.


Thanks,
Arun

[sqlite] a string indexing experiment

Reply via email to