-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 05/03/12 02:59, Larry Knibb wrote:
> Currently my database is 41.5MB with indexes and to reduce the download
> I'm zipping it with max. compression Deflate down to 18.4MB.
> 
> I noticed I can get it down to 12.1MB using LZMA compression but 
> unfortunately there's no LZMA library for Windows Phone yet... but
> anyway.

zip uses a compression window size of 32kb. bzip2 is 900kb.  LZMA on
maximum can be gigabytes.

> I further noticed that if I .dump the SQL to create the database then
> zip that, the resulting size is just 6.7MB;

A user on a mailing list I was on was trying to save space like you and
found the same pattern you did.  A raw SQLite database has padding due to
page size and alignment, plus prefix bytes to values (eg giving
string/blob lengths), not to mention the btree structure for the records.

SQL text was more efficient because there is no padding/alignment/btree.
However every line has a redundant "insert into table values ..." so CSV
is even more efficient as it is using one byte to mark the end of each
value (the comma).  His data had large numbers somewhat sequential and
they take more space as text (digits), so the CSV was adapted to be the
deltas between values/rows which was usually a digit or so.  bzip2 was
used on the final delta CSV to get the smallest size.  LZMA gained a few
more percent on bzip2 in size but not enough to bother using.

What works best for you will depend on what your data looks like, how much
it varies, the types etc.  If there is a lot of text then you can also
look to the current best text compressors - see

  http://mattmahoney.net/dc/text.html

You may also be able to save even more space by having the data a column
at a time instead of a row at a time, although it would require
considerably more temporary space to work out each row before
reconstructing the database.

Similarly you may be able to use a custom binary delta encoding that uses
fewer bits to encode each row.

Roger
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)

iEYEARECAAYFAk9U+tMACgkQmOOfHg372QT8dgCfXRV3OwrDE+oUksDNCy+GAccu
h9YAoJ6LaZY7521lfj94BoIxGp1wMpS1
=rxdX
-----END PGP SIGNATURE-----
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to