On 7/11/07, Steve Krulewitz <[EMAIL PROTECTED]> wrote:
In the application I am working on (Songbird), we have a simple two
table schema representing the tracks in your music collection and the
properties on those tracks.  The keys used are all UUIDs (128 bit
number) which we are currently storing in hex string form in a text
column, so they literally appear in the database as
"ee89b823-e709-40c5-bed7-dcb0b2b791a8".  We do lots of lookups and
joins on these keys.

I was wondering if there is much to be gained by storing these 128 bit
values in binary rather than as strings.

Probably, because in a database space is the same as speed in many
cases.  Smaller databases will generally be faster because you can
cover more items in the same number of disk reads.

- Use a pair of integer columns

Probably a bad idea.  SQLite stores integers using a variable encoding
with 7 bits of data stored per byte.  That works well for data where
the high-order bits are mostly 0 (lots of small positive integers).
This data would be expected to have uniform distribution, so you'll
probably lose on both encoding/decoding overhead and space.

- Continue to use a text column and shove binary data into it and hope
for the best
- Use a blob column, but I wonder how efficient indexes and joins are
on this column type?

The blob column should be strictly more performant than the same-size
text column, because SQLite treats blob data as literal bytes, while
the text column may involve other interesting operations (though
generally doesn't).  Since in this case the blob column would be able
to store data more compactly, it is likely to win across the board.

- Create another table in the schema that maps integers to UUIDs, and
use these integers in the rest of the schema rather than the UUIDs.
This would add an additional join every time we want to do a query
based on UUID.

It really depends on how often this is happening.  SQLite tables are
organized by rowid.  An index on a table maps the indexed column to
rowid, so querying through an index involves traversing two
(generally) uncorrelated btrees.  If you have many tables with indices
on these UUID values, it might make sense to consolidate into a single
index and use rowids everywhere else.

You might even be able to get away without that index.  If you hash
the UUID and take the low-order 64 bits of the result, you could use
that as the rowid directly.  Even the low-order 32 bits might suffice.
[Don't just take the low-order bits of the UUID directly, since you
have no idea if they were generated appropriately, you might run into
an unacceptable level of collisions.]

-scott

-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------

Reply via email to