Hi list,

I'm trying to understand unexplained table bloat I found in what should be
a very small table in an old database file. If you dump/restore the
database, the problem goes away. If you duplicate the table, the problem
propagates. Schema:

CREATE TABLE copied(
  id_local INT,
  LUTFullString,
  LUTHash
);

Test Data:
85960605|,|0DE19F8AA100D2962FF22C60B3FA8940
85960465|,|A44615408E8B3E48A684C60CA2967031
85960451|,|33C8804354984071A740788AD308B279

In my test database file, each of these 3 rows is allocating about 40 4k
overflow pages for a total database size of about 500k. The full database
has about 4MB of actual data which takes up over 500MB on disk. If you want
to see/reproduce the problem you'll need my test database file which I've
uploaded here: https://mackman.net/200k-per-row.sqlite.zip (500kb
download). I don't know why it doesn't compress better, those extra
overflow pages must be getting filled with random garbage.

My uninformed guess is there was a bug in the version of sqlite used at
database creation time that computed an incorrect overflow threshold and is
storing each byte of the row to its own page. Since the problem goes away
with a dump/restore, I'm considering releasing a script to do that and
mitigate the problem for affected users. Before doing that I would like to
understand the problem better.

Thanks for reading, Ryan

PS: Here's some output from my debug session showing the 123 bytes of data
is occupying 582k of space on disk in a freshly created table.

% sqlite3 200k-per-row.sqlite
SQLite version 3.28.0 2019-04-15 14:49:49
Enter ".help" for usage hints.
sqlite> .tables
copied

sqlite> .schema copied
CREATE TABLE copied(
  id_local INT,
  LUTFullString,
  LUTHash
);

sqlite> select * from copied;
85960605|,|0DE19F8AA100D2962FF22C60B3FA8940
85960465|,|A44615408E8B3E48A684C60CA2967031
85960451|,|33C8804354984071A740788AD308B279

sqlite> select sum(length(id_local) + length(lutfullstring) +
length(luthash)) from copied;
123

sqlite> create table copied2 as select * from copied;
sqlite> SELECT name, SUM("pgsize") as sz FROM dbstat group by name;
copied|581632
copied2|581632
sqlite_master|4096
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to