"Steve Krulewitz" <[EMAIL PROTECTED]> wrote: > Hey all -- > > In the application I am working on (Songbird), we have a simple two > table schema representing the tracks in your music collection and the > properties on those tracks. The keys used are all UUIDs (128 bit > number) which we are currently storing in hex string form in a text > column, so they literally appear in the database as > "ee89b823-e709-40c5-bed7-dcb0b2b791a8". We do lots of lookups and > joins on these keys. > > I was wondering if there is much to be gained by storing these 128 bit > values in binary rather than as strings.
Storing the UUIDs as BLOBs rather than as hex-encoded strings will requires 21 bytes per key instead of 43. So your indices will have about twice their current fanout. That means that you can expect to roughly double your lookup performance on a cold cache. (If the database, or large parts of it, is already in your OS cache, the difference will be much less and will likely not be noticable.) The downside of using BLOBs is that you cannot see them easily during debugging when you do a "SELECT * FROM...". You have to use constructs like, "SELECT hex(uuid), ... FROM" which is more typing. You can fix that with a VIEW, I suppose. The thing to consider is why you are using 128-bit UUIDs in the first place? Presumably you are doing this so that you can sync and/or merge independently created databases without having to worry about key collisions. If you are not doing syncs or merges or independently generated keys, then I can't think of a good reason to use 128-bit UUIDs in the first place. Just use small integer autogenerated keys from SQLite. Assuming you are doing syncing and merging, what I tend to do in these kinds of situations is to create a mapping between the universally unique ID (UUID) and a small integer ID that is unique in the local database - call the latter the RID. The RID is just an integer and can be your INTEGER PRIMARY KEY for very fast lookups. Looking up a record by INTEGER PRIMARY KEY is always twice as fast as looking up the same record by any other key, and because of high fanout can potentially be much faster if you have a cold cache. So I use the RID for joins or other internal operations and only use the UUID for lookups from external data. For example, your schema might look something like this: CREATE TABLE album( aid INTEGER PRIMARY KEY, -- Internally unique album ID uuid TEXT UNIQUE, -- Universally unique album ID ... ); CREATE TABLE track( tid INTEGER PRIMARY KEY, -- Internally unique track ID uuid TEXT UNIQUE, -- Universally unique track ID aid INTEGER REFERENCES album, -- Album containing this track ... ); CREATE INDEX track_album ON track(aid); A typical query might be to find all tracks of an album: SELECT * FROM track WHERE aid=(SELECT aid FROM album WHERE title=?) And queries like this will run much faster using INTEGER PRIMARY KEYS rather than UUIDs. All that said, for even the largest music collection, your database is unlikely to have more than a few thousand albums and a few tens of thousands of tracks, and with such a small database and running on a modern workstations, probably just about anything you do is going to be fast enough. The optimizations described above are useful if you have tables with millions of entries or if you are doing thousands of queries per second or if you are running on some woefully underpowered portable music player. But for a personal music library running on a workstation at human-interaction speed, use whatever schema makes it easiest for you to type in correct and working code. -- D. Richard Hipp <[EMAIL PROTECTED]> ----------------------------------------------------------------------------- To unsubscribe, send email to [EMAIL PROTECTED] -----------------------------------------------------------------------------