Ok, I am having trouble counting this morning.

No of rows in Table 1: 1165
No of rows in Table 2: 376295

Analysis is attached.

Sorry for the continued confusion.

Cheers,
S.

-----Original Message-----
From: Jens Miltner [mailto:[EMAIL PROTECTED] 
Sent: 28 November 2008 12:58
To: [EMAIL PROTECTED]
Cc: 'General Discussion of SQLite Database'
Subject: Re: [sqlite] Database file size


Am 28.11.2008 um 13:37 schrieb Simon Bulman:

> Ahhh, sorry, I wrongly calculated the number of rows in table 2. It  
> actually
> has 29581 rows. Still surprised at the 7x size increase but perhaps  
> you are
> not based on the overheads?

I still can't reproduce your database sizes - creating 3000 rows in  
table 1 and 30000 in table 2 I end up with 745kB database file size.
Even with 30000 rows in table 1 I end up with only 2.9MB file size...

And the overhead I was talking about is overhead that sqlite maintains  
for each row in every database, so that should be the same on your end  
and my end.

Unless your row counts are much, much higher, I suspect that there's  
either
(1) other tables that contribute the major part to the database file  
size
(2) your database does have lots of free space and needs to be vacuumed

You could try to run sqlite3_analyzer on your database to see the  
memory usage of each table, free space, etc.
Are you using a stock sqlite3 installation or did you modify/customize  
it through build settings?

</jum>


>
> -----Original Message-----
> From: Jens Miltner [mailto:[EMAIL PROTECTED]
> Sent: 28 November 2008 08:38
> To: [EMAIL PROTECTED]
> Cc: General Discussion of SQLite Database
> Subject: Re: [sqlite] Database file size
>
>
> Am 28.11.2008 um 09:20 schrieb Simon Bulman:
>
>> Hi Jens,
>>
>> Thanks for your input. UTF-8 did not make a difference. I expected
>> that
>> SQLite file would be larger on disk than our proprietary format
>> because of
>> the overheads that you mention - I am surprised however it at least  
>> 7x
>> larger.
>
> To be honest - given your table definitions below, I'm surprised the
> database is _that_ large, too:
>
> Table 1 - according to your definition - should contain at most about
> 50 bytes of pure data per row (plus the overhead needed by SQLite).
> Table 2 would only contain ~ 16 bytes of data per row.
>
> Dividing the database disk size by the total number of rows you
> mentioned, would indicate a whopping 8k per row.
>
> I did a quick test and created a schema similar to what you outlined
> and filled it with data (the same number of rows you mentioned and 28
> and ~20 characters per row for the two varchar columns) and my
> database ended up being 71kB in size instead of the 11.8 MB you saw...
>
> Are there any other tables that contain non-negligible amounts of  
> data?
> Are the data sizes indeed what's indicated in the schema (since SQLite
> doesn't really care about the varchar size constraints you can
> actually put any amount of data into a varchar(30) column) ?
>
> </jum>
>
>
>>
>> I am actually recreating the whole database (delete file and  
>> recreate)
>> programmatically so vacuuming has not effect.
>>
>> Cheers,
>> S.
>>
>> -----Original Message-----
>> From: Jens Miltner [mailto:[EMAIL PROTECTED]
>> Sent: 27 November 2008 13:48
>> To: General Discussion of SQLite Database
>> Cc: [EMAIL PROTECTED]
>> Subject: Re: [sqlite] Database file size
>>
>>
>> Am 27.11.2008 um 09:12 schrieb Simon Bulman:
>>
>>> I have been playing around with SQLite to use as an alternative to
>>> one of
>>> our proprietary file formats used to read large amounts of data. Our
>>> proprietary format performs very badly i.e. takes a long time to
>>> load some
>>> data; as expected SQLite is lighting quick in comparison - great!
>>>
>>> One considerable stumbling block is the footprint (size) of the
>>> database
>>> file on disk. It turns out that SQLite is roughly 7x larger than our
>>> proprietary format - this is prohibitive. The data is pretty simple
>>> really,
>>> 2 tables
>>>
>>> Table 1
>>>
>>> BIGINT (index),  VARCHAR(30), VARCHAR(10)
>>>
>>>
>>> Table 2
>>>
>>> BIGINT (index), FLOAT
>>>
>>>
>>> For a particular data set Table1 has 1165 rows and Table 2 has 323
>>> rows,
>>> however typically Table 2 becomes bigger for larger models. The size
>>> on disk
>>> of this file is 11.8 Mb (compared to 1.7 Mb for our proprietary
>>> format). I
>>> have noticed that if I drop the indexes the size drops  
>>> dramatically -
>>> however the query performance suffers to an unacceptable level.
>>>
>>> For a larger model the DB footprint is 2.2 Gb compared to 267 Mb for
>>> the
>>> proprietary format.
>>>
>>> Does anybody have any comments on this? Are there any configuration
>>> options
>>> or ideas I could use to reduce the footprint of the db file?
>>
>>
>> I don't think you'll be able to make SQLite as efficient (regarding
>> storage size) as a custom file format, because it has to have some
>> overhead for indexes, etc.
>>
>> However, one thing that comes to mind is the way string data is
>> stored:
>> If you're concerned about disk space an your string data is mostly
>> ASCII, make sure your strings are stored as UTF-8 - for ASCII string
>> data, this will save you one byte per character in the string data
>> storage.
>> To enforce UTF-8 string storage, execute "PRAGMA encoding='UTF-8'" as
>> the first command when creating the database (before you create and
>> tables).
>> You can query the format using "PRAGMA encoding" - UTF-16 encodings
>> will store two bytes / character, regardless of the actual
>> characters...
>>
>> Note that this doesn't mean your database size will shrink to half  
>> the
>> size - it merely means you'll be able to fit more rows onto a single
>> page, thus eventually you should see a decrease in file size when
>> comparing UTF-16 vs. UTF-8 databases.
>>
>> BTW: are you aware that SQLite database won't shrink by themselves?
>> You'll have to vacuum them to reclaim unused space (see
>> <http://www.sqlite.org/faq.html#q12
>>> )
>>
>> HTH,
>> </jum>
>>
>

/** Disk-Space Utilization Report For Test.db
*** As of 2008-Nov-28 12:46:11

Page size in bytes.................... 1024      
Pages in the whole file (measured).... 12121     
Pages in the whole file (calculated).. 12121     
Pages that store data................. 12121      100.0% 
Pages on the freelist (per header).... 0            0.0% 
Pages on the freelist (calculated).... 0            0.0% 
Pages of auto-vacuum overhead......... 0            0.0% 
Number of tables in the database...... 4         
Number of indices..................... 2         
Number of named indices............... 1         
Automatically generated indices....... 1         
Size of the file in bytes............. 12411904  
Bytes of user payload stored.......... 3982009     32.1% 

*** Page counts for all tables with their indices ********************

SUMMARY_VECTORS....................... 12066       99.55% 
SUMMARY_IDS........................... 53           0.44% 
SQLITE_MASTER......................... 1            0.008% 
START_DATE............................ 1            0.008% 

*** All tables and indices *******************************************

Percentage of total database.......... 100.0%    
Number of entries..................... 754926    
Bytes of storage consumed............. 12411904  
Bytes of payload...................... 8190844     66.0% 
Average payload per entry............. 10.85     
Average unused bytes per entry........ 0.88      
Average fanout........................ 97.00     
Fragmentation.........................  15.8%    
Maximum payload per entry............. 150       
Entries that use overflow............. 0            0.0% 
Index pages used...................... 64        
Primary pages used.................... 12057     
Overflow pages used................... 0         
Total pages used...................... 12121     
Unused bytes on index pages........... 9468        14.4% 
Unused bytes on primary pages......... 656111       5.3% 
Unused bytes on overflow pages........ 0         
Unused bytes on all pages............. 665579       5.4% 

*** All tables *******************************************************

Percentage of total database..........  51.5%    
Number of entries..................... 377466    
Bytes of storage consumed............. 6395904   
Bytes of payload...................... 3982574     62.3% 
Average payload per entry............. 10.55     
Average unused bytes per entry........ 0.16      
Average fanout........................ 97.00     
Fragmentation.........................   4.2%    
Maximum payload per entry............. 150       
Entries that use overflow............. 0            0.0% 
Index pages used...................... 64        
Primary pages used.................... 6182      
Overflow pages used................... 0         
Total pages used...................... 6246      
Unused bytes on index pages........... 9468        14.4% 
Unused bytes on primary pages......... 51253        0.81% 
Unused bytes on overflow pages........ 0         
Unused bytes on all pages............. 60721        0.95% 

*** All indices ******************************************************

Percentage of total database..........  48.5%    
Number of entries..................... 377460    
Bytes of storage consumed............. 6016000   
Bytes of payload...................... 4208270     70.0% 
Average payload per entry............. 11.15     
Average unused bytes per entry........ 1.60      
Fragmentation.........................  28.0%    
Maximum payload per entry............. 12        
Entries that use overflow............. 0            0.0% 
Primary pages used.................... 5875      
Overflow pages used................... 0         
Total pages used...................... 5875      
Unused bytes on primary pages......... 604858      10.1% 
Unused bytes on overflow pages........ 0         
Unused bytes on all pages............. 604858      10.1% 

*** Table SQLITE_MASTER **********************************************

Percentage of total database..........   0.008%  
Number of entries..................... 5         
Bytes of storage consumed............. 1024      
Bytes of payload...................... 565         55.2% 
Average payload per entry............. 113.00    
Average unused bytes per entry........ 65.60     
Maximum payload per entry............. 150       
Entries that use overflow............. 0            0.0% 
Primary pages used.................... 1         
Overflow pages used................... 0         
Total pages used...................... 1         
Unused bytes on primary pages......... 328         32.0% 
Unused bytes on overflow pages........ 0         
Unused bytes on all pages............. 328         32.0% 

*** Table START_DATE *************************************************

Percentage of total database..........   0.008%  
Number of entries..................... 1         
Bytes of storage consumed............. 1024      
Bytes of payload...................... 8            0.78% 
Average payload per entry............. 8.00      
Average unused bytes per entry........ 1004.00   
Maximum payload per entry............. 8         
Entries that use overflow............. 0            0.0% 
Primary pages used.................... 1         
Overflow pages used................... 0         
Total pages used...................... 1         
Unused bytes on primary pages......... 1004        98.0% 
Unused bytes on overflow pages........ 0         
Unused bytes on all pages............. 1004        98.0% 

*** Table SUMMARY_IDS and all its indices ****************************

Percentage of total database..........   0.44%   
Number of entries..................... 2330      
Bytes of storage consumed............. 54272     
Bytes of payload...................... 39271       72.4% 
Average payload per entry............. 16.85     
Average unused bytes per entry........ 2.17      
Average fanout........................ 33.00     
Fragmentation.........................  96.2%    
Maximum payload per entry............. 34        
Entries that use overflow............. 0            0.0% 
Index pages used...................... 1         
Primary pages used.................... 52        
Overflow pages used................... 0         
Total pages used...................... 53        
Unused bytes on index pages........... 759         74.1% 
Unused bytes on primary pages......... 4296         8.1% 
Unused bytes on overflow pages........ 0         
Unused bytes on all pages............. 5055         9.3% 

*** Table SUMMARY_IDS w/o any indices ********************************

Percentage of total database..........   0.28%   
Number of entries..................... 1165      
Bytes of storage consumed............. 34816     
Bytes of payload...................... 27469       78.9% 
Average payload per entry............. 23.58     
Average unused bytes per entry........ 0.96      
Average fanout........................ 33.00     
Fragmentation.........................  97.0%    
Maximum payload per entry............. 34        
Entries that use overflow............. 0            0.0% 
Index pages used...................... 1         
Primary pages used.................... 33        
Overflow pages used................... 0         
Total pages used...................... 34        
Unused bytes on index pages........... 759         74.1% 
Unused bytes on primary pages......... 361          1.1% 
Unused bytes on overflow pages........ 0         
Unused bytes on all pages............. 1120         3.2% 

*** Indices of table SUMMARY_IDS *************************************

Percentage of total database..........   0.16%   
Number of entries..................... 1165      
Bytes of storage consumed............. 19456     
Bytes of payload...................... 11802       60.7% 
Average payload per entry............. 10.13     
Average unused bytes per entry........ 3.38      
Fragmentation......................... 100.0%    
Maximum payload per entry............. 11        
Entries that use overflow............. 0            0.0% 
Primary pages used.................... 19        
Overflow pages used................... 0         
Total pages used...................... 19        
Unused bytes on primary pages......... 3935        20.2% 
Unused bytes on overflow pages........ 0         
Unused bytes on all pages............. 3935        20.2% 

*** Table SUMMARY_VECTORS and all its indices ************************

Percentage of total database..........  99.55%   
Number of entries..................... 752590    
Bytes of storage consumed............. 12355584  
Bytes of payload...................... 8151000     66.0% 
Average payload per entry............. 10.83     
Average unused bytes per entry........ 0.88      
Average fanout........................ 98.00     
Fragmentation.........................  15.4%    
Maximum payload per entry............. 17        
Entries that use overflow............. 0            0.0% 
Index pages used...................... 63        
Primary pages used.................... 12003     
Overflow pages used................... 0         
Total pages used...................... 12066     
Unused bytes on index pages........... 8709        13.5% 
Unused bytes on primary pages......... 650483       5.3% 
Unused bytes on overflow pages........ 0         
Unused bytes on all pages............. 659192       5.3% 

*** Table SUMMARY_VECTORS w/o any indices ****************************

Percentage of total database..........  51.2%    
Number of entries..................... 376295    
Bytes of storage consumed............. 6359040   
Bytes of payload...................... 3954532     62.2% 
Average payload per entry............. 10.51     
Average unused bytes per entry........ 0.15      
Average fanout........................ 98.00     
Fragmentation.........................   3.8%    
Maximum payload per entry............. 17        
Entries that use overflow............. 0            0.0% 
Index pages used...................... 63        
Primary pages used.................... 6147      
Overflow pages used................... 0         
Total pages used...................... 6210      
Unused bytes on index pages........... 8709        13.5% 
Unused bytes on primary pages......... 49560        0.79% 
Unused bytes on overflow pages........ 0         
Unused bytes on all pages............. 58269        0.92% 

*** Indices of table SUMMARY_VECTORS *********************************

Percentage of total database..........  48.3%    
Number of entries..................... 376295    
Bytes of storage consumed............. 5996544   
Bytes of payload...................... 4196468     70.0% 
Average payload per entry............. 11.15     
Average unused bytes per entry........ 1.60      
Fragmentation.........................  27.8%    
Maximum payload per entry............. 12        
Entries that use overflow............. 0            0.0% 
Primary pages used.................... 5856      
Overflow pages used................... 0         
Total pages used...................... 5856      
Unused bytes on primary pages......... 600923      10.0% 
Unused bytes on overflow pages........ 0         
Unused bytes on all pages............. 600923      10.0% 

*** Definitions ******************************************************

Page size in bytes

    The number of bytes in a single page of the database file.  
    Usually 1024.

Number of pages in the whole file

    The number of 1024-byte pages that go into forming the complete
    database

Pages that store data

    The number of pages that store data, either as primary B*Tree pages or
    as overflow pages.  The number at the right is the data pages divided by
    the total number of pages in the file.

Pages on the freelist

    The number of pages that are not currently in use but are reserved for
    future use.  The percentage at the right is the number of freelist pages
    divided by the total number of pages in the file.

Pages of auto-vacuum overhead

    The number of pages that store data used by the database to facilitate
    auto-vacuum. This is zero for databases that do not support auto-vacuum.

Number of tables in the database

    The number of tables in the database, including the SQLITE_MASTER table
    used to store schema information.

Number of indices

    The total number of indices in the database.

Number of named indices

    The number of indices created using an explicit CREATE INDEX statement.

Automatically generated indices

    The number of indices used to implement PRIMARY KEY or UNIQUE constraints
    on tables.

Size of the file in bytes

    The total amount of disk space used by the entire database files.

Bytes of user payload stored

    The total number of bytes of user payload stored in the database. The
    schema information in the SQLITE_MASTER table is not counted when
    computing this number.  The percentage at the right shows the payload
    divided by the total file size.

Percentage of total database

    The amount of the complete database file that is devoted to storing
    information described by this category.

Number of entries

    The total number of B-Tree key/value pairs stored under this category.

Bytes of storage consumed

    The total amount of disk space required to store all B-Tree entries
    under this category.  The is the total number of pages used times
    the pages size.

Bytes of payload

    The amount of payload stored under this category.  Payload is the data
    part of table entries and the key part of index entries.  The percentage
    at the right is the bytes of payload divided by the bytes of storage 
    consumed.

Average payload per entry

    The average amount of payload on each entry.  This is just the bytes of
    payload divided by the number of entries.

Average unused bytes per entry

    The average amount of free space remaining on all pages under this
    category on a per-entry basis.  This is the number of unused bytes on
    all pages divided by the number of entries.

Fragmentation

    The percentage of pages in the table or index that are not
    consecutive in the disk file.  Many filesystems are optimized
    for sequential file access so smaller fragmentation numbers 
    sometimes result in faster queries, especially for larger
    database files that do not fit in the disk cache.

Maximum payload per entry

    The largest payload size of any entry.

Entries that use overflow

    The number of entries that user one or more overflow pages.

Total pages used

    This is the number of pages used to hold all information in the current
    category.  This is the sum of index, primary, and overflow pages.

Index pages used

    This is the number of pages in a table B-tree that hold only key (rowid)
    information and no data.

Primary pages used

    This is the number of B-tree pages that hold both key and data.

Overflow pages used

    The total number of overflow pages used for this category.

Unused bytes on index pages

    The total number of bytes of unused space on all index pages.  The
    percentage at the right is the number of unused bytes divided by the
    total number of bytes on index pages.

Unused bytes on primary pages

    The total number of bytes of unused space on all primary pages.  The
    percentage at the right is the number of unused bytes divided by the
    total number of bytes on primary pages.

Unused bytes on overflow pages

    The total number of bytes of unused space on all overflow pages.  The
    percentage at the right is the number of unused bytes divided by the
    total number of bytes on overflow pages.

Unused bytes on all pages

    The total number of bytes of unused space on all primary and overflow 
    pages.  The percentage at the right is the number of unused bytes 
    divided by the total number of bytes.

**********************************************************************
The entire text of this report can be sourced into any SQL database
engine for further analysis.  All of the text above is an SQL comment.
The data used to generate this report follows:
*/
BEGIN;
CREATE TABLE space_used(
   name clob,        -- Name of a table or index in the database file
   tblname clob,     -- Name of associated table
   is_index boolean, -- TRUE if it is an index, false for a table
   nentry int,       -- Number of entries in the BTree
   leaf_entries int, -- Number of leaf entries
   payload int,      -- Total amount of data stored in this table or index
   ovfl_payload int, -- Total amount of data stored on overflow pages
   ovfl_cnt int,     -- Number of entries that use overflow
   mx_payload int,   -- Maximum payload size
   int_pages int,    -- Number of interior pages used
   leaf_pages int,   -- Number of leaf pages used
   ovfl_pages int,   -- Number of overflow pages used
   int_unused int,   -- Number of unused bytes on interior pages
   leaf_unused int,  -- Number of unused bytes on primary pages
   ovfl_unused int,  -- Number of unused bytes on overflow pages
   gap_cnt int       -- Number of gaps in the page layout
);
INSERT INTO space_used 
VALUES('sqlite_master','sqlite_master',0,5,5,565,0,0,150,0,1,0,0,328,0,0);
INSERT INTO space_used 
VALUES('start_date','start_date',0,1,1,8,0,0,8,0,1,0,0,1004,0,0);
INSERT INTO space_used 
VALUES('summary_ids','summary_ids',0,1197,1165,27469,0,0,34,1,33,0,759,361,0,32);
INSERT INTO space_used 
VALUES('summary_vectors','summary_vectors',0,382441,376295,3954532,0,0,17,63,6147,0,8709,49560,0,233);
INSERT INTO space_used 
VALUES('sqlite_autoindex_summary_ids_1','summary_ids',1,1165,1165,11802,0,0,11,0,19,0,0,3935,0,18);
INSERT INTO space_used 
VALUES('summary_id','summary_vectors',1,376295,376295,4196468,0,0,12,0,5856,0,0,600923,0,1628);
COMMIT;
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to