Re: [sqlite] Huge WAL log

Keith Medcalf Fri, 16 Jan 2015 11:47:42 -0800

Some data using python+apsw for 10,000,000 records:

So, embeding the values in the SQL (ie, as text statements) and preparing a 
"new" statement each time is the least efficient,
Multiple Values that are embedded in the SQL are more efficient, up to about 
50/100 values per statement (50% faster)
Preparing a single statement and using the same statement to insert with the 
data being bound, 1 at a time, is even more efficient.
The last case (list of prepeared bindings), is not significantly different that 
the "one at a time" bound insert.


No significant difference for WAL journalling.

Generated 10000000 random values in 50.9100000858 seconds
Inserted 10000000 records individually parsed in 154.308000088 seconds
Inserted 10000000 in groups of 2 in 158.332999945 seconds
Inserted 10000000 in groups of 5 in 104.661999941 seconds
Inserted 10000000 in groups of 10 in 86.7520000935 seconds
Inserted 10000000 in groups of 25 in 76.3989999294 seconds
Inserted 10000000 in groups of 50 in 73.9539999962 seconds
Inserted 10000000 in groups of 100 in 72.7170000076 seconds
Inserted 10000000 in groups of 500 in 76.751999855 seconds
Inserted 10000000 in groups of 1000 in 77.3090000153 seconds
Inserted 10000000 in groups of 5000 in 76.0599999428 seconds
Inserted 10000000 in groups of 10000 in 74.4079999924 seconds
Inserted 10000000 records via binding in 59.3770000935 seconds
Inserted 10000000 records via single binding list in 53.5610001087 seconds

Generated by:

import random
import time
import apsw

recs = 10000000
db = apsw.Connection('test.db')
cr = db.cursor()

chars = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
data = set()

start = time.time()
while len(data) < recs:
    item = ''
    for j in xrange(8):
        item += chars[int(random.random() * 52)]
    data.add(item)
print 'Generated', recs, 'random values in', time.time() - start, 'seconds'

data = list(data)
mdata = [(x,) for x in data]

cr.execute('drop table if exists test; create table test ( x text unique ); 
vacuum;')

start = time.time()
cr.execute('begin')
for i in xrange(recs):
    cr.execute("insert into test values ('%s')" % (data[i],))
cr.execute('commit')
print 'Inserted', recs, 'records individually parsed in', time.time() - start, 
'seconds'

for steps in [2, 5, 10, 25, 50, 100, 500, 1000, 5000, 10000]:
    cr.execute('drop table if exists test; create table test ( x text unique ); 
vacuum;')
    start = time.time()
    cr.execute('begin')
    for i in xrange(0, recs, steps):
        sql = "insert into test values ('" + "'),('".join(data[i:i+steps]) + 
"')"
        cr.execute(sql)
    cr.execute('commit')
    print 'Inserted', recs, 'in groups of', steps, 'in', time.time() - start, 
'seconds'

cr.execute('drop table if exists test; create table test ( x text unique ); 
vacuum;')

start = time.time()
cr.execute('begin')
for i in xrange(recs):
    cr.execute("insert into test values (?)", (data[i],))
cr.execute('commit')
print 'Inserted', recs, 'records via binding in', time.time() - start, 'seconds'

cr.execute('drop table if exists test; create table test ( x text unique ); 
vacuum;')

start = time.time()
cr.execute('begin')
cr.executemany("insert into test values (?)", mdata)
cr.execute('commit')
print 'Inserted', recs, 'records via single binding list in', time.time() - 
start, 'seconds'


---
Theory is when you know everything but nothing works.  Practice is when 
everything works but no one knows why.  Sometimes theory and practice are 
combined:  nothing works and no one knows why.

>-----Original Message-----
>From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-
>boun...@sqlite.org] On Behalf Of Jan Slodicka
>Sent: Friday, 16 January, 2015 10:04
>To: sqlite-users@sqlite.org
>Subject: Re: [sqlite] Huge WAL log
>
>Simon Slavin-3 wrote
>>> Thanks to your post I discovered multiple-row inserts so that I now
>>> understand what you asked.
>>
>> Just a note that multiple-row inserts were added to SQLite relatively
>> recently (2012-03-20 (3.7.11)) and, because SQLite does only
>> database-level locking, its overhead for INSERTs is far less than that
>of
>> SQL Server.  It might be faster to use them but I would expect it to be
>so
>> much faster than many inserts as part of one transaction.
>
>I made a fast, perhaps oversimplified, TestA:
>
>Create an empty database. (file based)
>CREATE TABLE Test (city NVARCHAR(120) NULL COLLATE NOCASE)
>BEGIN
>Then I inserted N records using commands such as
>INSERT INTO Test VALUES('_random_'). (_random was a random 8 character
>long
>string.)
>COMMIT
>Measure WAL size
>Close the DB
>Measure DB size
>
>TestB was performed with the same data except the records were grouped
>INSERT INTO Test VALUES('_random_'), VALUES('_random1')...
>
>I tested different groupings (1,2,5,10) and different N values (100000 -
>2500000).
>
>Results:
>- The more records are grouped, the faster.
>- Grouping of 10 records was more than 2x faster than no grouping at all.
>- WAL size did not depend on the grouping used and was just slightly
>larger
>than the DB size.
>
>Then I modified the test by adding an index on the single column. I run 2
>sets of tests - one where the index was created before first insert and
>the
>one with the index created after all inserts finished.
>
>Results:
>- Active index: WAL size ~ DB size
>- Inactive index: WAL size ~ 50% of the DB size
>- Tests with an active index were slower by 15-20%
>
>Conclusion:
>Switching off the indexing during a bulk insert brings minor advantages.
>Multi-row inserts may bring larger advantages.
>The reason of the "huge WAL problem" remains unclear.
>
>Note:
>The tests were programmed in c# code that called native SQLite.dll and
>were
>run on a W7 desktop. The results may not apply to other environments.
>
>
>
>
>
>--
>View this message in context: http://sqlite.1065341.n5.nabble.com/Huge-
>WAL-log-tp79991p80070.html
>Sent from the SQLite mailing list archive at Nabble.com.
>_______________________________________________
>sqlite-users mailing list
>sqlite-users@sqlite.org
>http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users



_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Huge WAL log

Reply via email to