I have a performance effect which I don't quite understand.
Maybe I'm using the wrong settings or something. Sorry for the long post,
but I wanted to include all the info that may be important.
 
My software is written in C++, runs on Windows 7/8, the SQLite database file
is either on a local SATA RAID disk or a SSD.
Typical database sizes are between 2 GB and 8 GB.
The largest tables hold several million entries. Also FTS4 is used, which
also creates large tables.
Fast internal RAID disks, SDD. Four Xeon cores. 8 GB RAM.
 
I'm using SQLite 3.8.0.2
WAL mode, shared cache enabled.
locking_mode=NORMAL
checkpoint_fullfsync=0
pragma page_size=4096
pragma cache_size=16384
 
General (retrieval) performance is excellent!
 
 
During an ingest phase, my application pumps in hundreds of thousands of
records into multiple tables.
There are massive amounts of writes during that phase, different record
sizes, tables with one to four indices etc.
 
My application is multi-threaded and inserts data into the database
concurrently from multiple threads.
The threads process data in batches, and use SQLite transactions to process
all records of a batch into the database. Transactions gain a lot of speed,
which outweighs the side effects of potential blocking.
The threads monitor the execution times of the database operations and
adjust the batch size to balance speed and transaction lock duration. Slower
operations cause smaller batches, which results in shorter database locks
and better concurrency. The system adapts fairly well to system performance
and data structure.
 
The performance was not that bad, but far from good.
 
For a given set of input data (100,000 "elements"), the execution estimate
was about 5 hours.
Database on a high-speed SSD.
The largest table holds about 5 million entries afterwards.
 
***With one single change*** I improved the execution time from 5 hours down
to about 30 minutes!
 
I changed 
 
PRAGMA synchronous=NORMAL
 
to 
 
PRAGMA synchronous=OFF
 
Also all other database write operations just 'fly' now.
I'm even more impressed with SQLite than before, but I wonder why is the
change so _dramatic_ ?
 
>From the docs my impression was that using WAL mode is ideal for bulk
inserts. That wrapping large bulks of data into smaller batches, wrapped in
BEGIN/COMMT is best for performance etc. That using synchronous=NORMAL
limits the file system flush/wait operations certain really important
operations.

I logged the execution times of various operations in this phase to a text
file. Everything was fast, the processing, the INSERTs etc.
But COMMIT operations sometimes took 20s, then 0.2s, then again 10s. That's
the time SQLite spends in the execute call with "COMMIT".

Of course the amount of data written in each transaction block varied, but
in general, 85% of the total execution time of my code was spent in the
COMMIT call.
My application was the only application with measurable disk I/O at the
time. No virus checker etc.
 
Changing to synchronous=OFF made the commits 10 times faster.
 
Is this the expected behavior or am I missing something obvious?
 

_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to