[EMAIL PROTECTED] wrote:
Dave Gierok <[EMAIL PROTECTED]> wrote:
It looks like the size of a Sqlite DB ends up being much larger (more than 2x) than size that I calculate for its data set.

A simple test shows that when creating one table with one integer column and filling it with 10000 rows, I get a DB size of 92KB instead of what I'd expect to be around 40KB plus some small
overhead for the table definition.  This seems to scale linearly
as I increase the amount of data in the DB.

SQLite stores 64-bit integers, not 32-bit as you suppose.  And
each row also stores a 64-bit integer rowid in addition to the
data.  So that it fits in 92KB instead of the (naively expected)
160KB suggests that SQLite is actually doing a reasonable job of
compressing the data.



I hate to disagree with the author, but that description is not quite accurate. :-)

SQLite uses variable length integer storage as described here http://www.sqlite.org/datatype3.html and Dave is only inserting 10K rows. Assuming he is entering 10K sequential integers, and the integer column is not declared as "integer primary key", he should have 2 byte rowids and 2 byte integer values, or about 40K bytes of raw data.

SQLite adds considerable overhead to the raw data in order to search and sort quickly, and to support its manifest typing and completely variable length record format. This is what causes the file size to increase to 92K.

The details of the version 2.X database file format are here http://www.sqlite.org/fileformat.html This should give you and idea of the kind of overhead data that is stored in the database file along with the raw data. To the best of my knowledge there is no equivalent document for the 3.X file format. This page http://www.sqlite.org/version3.html describes some of the changes between the 2.8 and 3.X series (in particular the variable length integer storage), and claims that typical database files are about 25% to 35% smaller in the version 3.X format. For many applications the variable length integer storage will be smaller than using fixed size 32 bit integers even though they allow larger 64 bit values to be stored.

If you just need to store 10K integers you can't beat a raw binary file holding 16 bit values, but if you want to use the fast searching and relational power of SQL then you will need to pay the cost of the additional storage overhead.

Dennis Cote



-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------

Reply via email to