Re: [sqlite] large bulk insert speed observations when page_size and cache_size are varied

RaghavendraK 70574 Tue, 18 Sep 2007 08:21:42 -0700

Why is the CPU shot up by 100% (relative)when the full db is in mem? This can 
be a concern.I understand there is not much i/o but 100% is too high.


regards
ragha
******************************************************************************************
 This email and its attachments contain confidential information from HUAWEI, 
which is intended only for the person or entity whose address is listed above. 
Any use of the information contained herein in any way (including, but not 
limited to, total or partial disclosure, reproduction, or dissemination) by 
persons other than the intended recipient(s) is prohibited. If you receive this 
e-mail in error, please notify the sender by phone or email immediately and 
delete it!
 
*****************************************************************************************

----- Original Message -----
From: Joe Wilson <[EMAIL PROTECTED]>
Date: Tuesday, September 18, 2007 8:28 pm
Subject: [sqlite] large bulk insert speed observations when page_size and 
cache_size are varied

> The bash script below attempts to quantify the relationship between 
> SQLite (3.5.0 alpha) page_size and cache_size towards the speed of 
> populating a large table with many indexes.
> 
> First some rough metrics on the final populated database:
> The database is approximately 208M in size, consisting of a single
> table, foo, and its associated indexes. The table 'foo' ends up 
> having 
> 177,147 randomly populated rows of a few hundred bytes each. 
> Its schema can be gleaned from the script. The script builds a brand
> new database from scratch for each page_size/cache_size combination 
> for each run to largely eliminate operating system cache effects.
> 
> #!/bin/bash
> SQLITE=./sqlite3
> for S in 4096 8192 16384 32768; do
> for C in 1000 2000 4000 8000 16000; do
> rm -f foo.db foo.csv
> echo "
>  pragma page_size=$S;
>  pragma default_cache_size=$C;
>  CREATE TABLE foo(
>    a text unique,
>    b text unique not null,
>    c text unique not null,
>    primary key(c,a,b)
>  );
>  CREATE INDEX foo_ba on foo(b,a);
> " | $SQLITE foo.db
> $SQLITE >foo.csv <<EOF
>  create view v1 as select 1 union all select 2 union all select 3;
>  select hex(randomblob(17)),
>         hex(randomblob(13)),
>         hex(randomblob(131))
>  from v1,v1,v1,v1,v1,v1,v1,v1,v1,v1,v1;
> EOF
> echo -n "page_size $S, cache_size $C : "
> time $SQLITE foo.db ".import foo.csv foo" 2>&1 | head -1
> done
> done
> 
> The raw output of the script on a 512M Linux box:
> 
> page_size 4096, cache_size 1000 : 39.72user 30.88system 
> 4:32.46elapsed 25%CPU
> page_size 4096, cache_size 2000 : 40.67user 26.34system 
> 4:46.75elapsed 23%CPU
> page_size 4096, cache_size 4000 : 40.47user 21.61system 
> 4:34.05elapsed 22%CPU
> page_size 4096, cache_size 8000 : 41.80user 13.37system 
> 4:55.15elapsed 18%CPU
> page_size 4096, cache_size 16000 : 42.23user 7.58system 
> 4:10.79elapsed 19%CPU
> page_size 8192, cache_size 1000 : 40.62user 37.50system 
> 3:11.05elapsed 40%CPU
> page_size 8192, cache_size 2000 : 43.01user 26.60system 
> 3:04.52elapsed 37%CPU
> page_size 8192, cache_size 4000 : 42.85user 16.55system 
> 2:57.13elapsed 33%CPU
> page_size 8192, cache_size 8000 : 43.62user 8.08system 
> 2:34.28elapsed 33%CPU
> page_size 8192, cache_size 16000 : 43.11user 2.75system 
> 1:48.53elapsed 42%CPU
> page_size 16384, cache_size 1000 : 43.07user 47.92system 
> 2:19.82elapsed 65%CPU
> page_size 16384, cache_size 2000 : 42.41user 31.77system 
> 1:59.79elapsed 61%CPU
> page_size 16384, cache_size 4000 : 42.38user 18.70system 
> 1:47.69elapsed 56%CPU
> page_size 16384, cache_size 8000 : 41.83user 9.06system 
> 1:18.35elapsed 64%CPU
> page_size 16384, cache_size 16000 : 41.28user 8.36system 
> 1:00.16elapsed 82%CPU
> page_size 32768, cache_size 1000 : 44.19user 52.55system 
> 2:03.40elapsed 78%CPU
> page_size 32768, cache_size 2000 : 43.15user 27.36system 
> 1:35.95elapsed 73%CPU
> page_size 32768, cache_size 4000 : 43.18user 11.14system 
> 1:10.48elapsed 77%CPU
> page_size 32768, cache_size 8000 : 42.91user 10.34system 
> 1:04.69elapsed 82%CPU
> page_size 32768, cache_size 16000 : 42.87user 10.28system 
> 1:02.35elapsed 85%CPU
> 
> The output regrouped by equal sized page_size x cache_size buckets, 
> for 
> equivalent total sqlite cache size per run:
> 
> -- ~16M total cache (1/16th of database pages in cache)
> page_size 4096, cache_size 4000 : 40.47user 21.61system 
> 4:34.05elapsed 22%CPU
> page_size 8192, cache_size 2000 : 43.01user 26.60system 
> 3:04.52elapsed 37%CPU
> page_size 16384, cache_size 1000 : 43.07user 47.92system 
> 2:19.82elapsed 65%CPU
> 
> -- ~32M total cache (1/8th of database pages in cache)
> page_size 4096, cache_size 8000 : 41.80user 13.37system 
> 4:55.15elapsed 18%CPU
> page_size 8192, cache_size 4000 : 42.85user 16.55system 
> 2:57.13elapsed 33%CPU
> page_size 16384, cache_size 2000 : 42.41user 31.77system 
> 1:59.79elapsed 61%CPU
> page_size 32768, cache_size 1000 : 44.19user 52.55system 
> 2:03.40elapsed 78%CPU
> 
> -- ~64M total cache (a quarter of database pages in cache)
> page_size 4096, cache_size 16000 : 42.23user 7.58system 
> 4:10.79elapsed 19%CPU
> page_size 8192, cache_size 8000 : 43.62user 8.08system 
> 2:34.28elapsed 33%CPU
> page_size 16384, cache_size 4000 : 42.38user 18.70system 
> 1:47.69elapsed 56%CPU
> page_size 32768, cache_size 2000 : 43.15user 27.36system 
> 1:35.95elapsed 73%CPU
> 
> -- ~128M total cache (half of database pages in cache)
> page_size 8192, cache_size 16000 : 43.11user 2.75system 
> 1:48.53elapsed 42%CPU
> page_size 16384, cache_size 8000 : 41.83user 9.06system 
> 1:18.35elapsed 64%CPU
> page_size 32768, cache_size 4000 : 43.18user 11.14system 
> 1:10.48elapsed 77%CPU
> 
> -- ~256M total cache (all database pages in cache)
> page_size 16384, cache_size 16000 : 41.28user 8.36system 
> 1:00.16elapsed 82%CPU
> page_size 32768, cache_size 8000 : 42.91user 10.34system 
> 1:04.69elapsed 82%CPU
> 
> Here we see the output of the script in tabular form:
> 
>           INSERT Wall Clock Time In Seconds
>                  (lower is better)
> 
>                         page_size
> 
>                   4096  8192  16384  32768
>                   ----  ----  -----  -----
>            16000   250   108     60     62
>             8000   295   154     78     64
> cache_size  4000   274   177    107     70
>             2000   286   184    119     95
>             1000   272   191    139    123
> 
> One might expect to see similar timings for equal page_size x
> cache_size values, but this is not the case. 
> Take, for example, the case where a total of roughly 64M of 
> sqlite cache is used. Elapsed time varies widely.
> 
> -- ~64M total cache (a quarter of database pages in cache)
> page_size 4096, cache_size 16000 : 42.23user 7.58system 
> 4:10.79elapsed 19%CPU
> page_size 8192, cache_size 8000 : 43.62user 8.08system 
> 2:34.28elapsed 33%CPU
> page_size 16384, cache_size 4000 : 42.38user 18.70system 
> 1:47.69elapsed 56%CPU
> page_size 32768, cache_size 2000 : 43.15user 27.36system 
> 1:35.95elapsed 73%CPU
> 
> We see that when only 1/4 of the database is able to fit into sqlite
> cache, it can be 2.6 times more speed efficient to favor a large 
> page_size over a large cache size to produce an equivalent amount of
> sqlite cache. This supports the argument that sqlite performs 
> better when 
> locality of reference improves.
> 
> When the entire database can fit into sqlite cache, however, we can 
> seethat having a smaller page size can be marginally more speed 
> efficient:
> -- ~256M total cache (all database pages in cache)
> page_size 32768, cache_size 8000 : 42.91user 10.34system 
> 1:04.69elapsed 82%CPU
> page_size 16384, cache_size 16000 : 41.28user 8.36system 
> 1:00.16elapsed 82%CPU
> page_size = 4096, cache_size = 64000 : 41.92user 2.48system 
> 0:58.87elapsed 75%CPU (*)
> (*) last row required a re-run of the script with new parameters
> 
> These figures are naturally schema and data dependent, but they do 
> highlightsome basic trends.
> 
> I maintain that pre-reserving contiguous blocks of pages for each 
> tableand index would considerably improve sqlite insert speed into 
> multi-index
> tables. If this were the case, locality of reference could be 
> improved on
> a per table/index basis, and you might be able to use smaller page 
> sizesfor further speed gains. If wasted space is a concern, the 
> unused pre-reserved 
> pages could be collected with a VACUUM after table population.
> 
> 
>       
> ____________________________________________________________________________________
> Yahoo! oneSearch: Finally, mobile search 
> that gives answers, not web links. 
> http://mobile.yahoo.com/mobileweb/onesearch?refer=1ONXIC
> 
> --------------------------------------------------------------------
> ---------
> To unsubscribe, send email to [EMAIL PROTECTED]
> --------------------------------------------------------------------
> ---------
> 
> 

-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------

Re: [sqlite] large bulk insert speed observations when page_size and cache_size are varied

Reply via email to