Am 27.11.2008 um 09:12 schrieb Simon Bulman: > I have been playing around with SQLite to use as an alternative to > one of > our proprietary file formats used to read large amounts of data. Our > proprietary format performs very badly i.e. takes a long time to > load some > data; as expected SQLite is lighting quick in comparison - great! > > One considerable stumbling block is the footprint (size) of the > database > file on disk. It turns out that SQLite is roughly 7x larger than our > proprietary format - this is prohibitive. The data is pretty simple > really, > 2 tables > > Table 1 > > BIGINT (index), VARCHAR(30), VARCHAR(10) > > > Table 2 > > BIGINT (index), FLOAT > > > For a particular data set Table1 has 1165 rows and Table 2 has 323 > rows, > however typically Table 2 becomes bigger for larger models. The size > on disk > of this file is 11.8 Mb (compared to 1.7 Mb for our proprietary > format). I > have noticed that if I drop the indexes the size drops dramatically - > however the query performance suffers to an unacceptable level. > > For a larger model the DB footprint is 2.2 Gb compared to 267 Mb for > the > proprietary format. > > Does anybody have any comments on this? Are there any configuration > options > or ideas I could use to reduce the footprint of the db file?
I don't think you'll be able to make SQLite as efficient (regarding storage size) as a custom file format, because it has to have some overhead for indexes, etc. However, one thing that comes to mind is the way string data is stored: If you're concerned about disk space an your string data is mostly ASCII, make sure your strings are stored as UTF-8 - for ASCII string data, this will save you one byte per character in the string data storage. To enforce UTF-8 string storage, execute "PRAGMA encoding='UTF-8'" as the first command when creating the database (before you create and tables). You can query the format using "PRAGMA encoding" - UTF-16 encodings will store two bytes / character, regardless of the actual characters... Note that this doesn't mean your database size will shrink to half the size - it merely means you'll be able to fit more rows onto a single page, thus eventually you should see a decrease in file size when comparing UTF-16 vs. UTF-8 databases. BTW: are you aware that SQLite database won't shrink by themselves? You'll have to vacuum them to reclaim unused space (see <http://www.sqlite.org/faq.html#q12 >) HTH, </jum> _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users