I have a performance effect which I don't quite understand. Maybe I'm using the wrong settings or something. Sorry for the long post, but I wanted to include all the info that may be important. My software is written in C++, runs on Windows 7/8, the SQLite database file is either on a local SATA RAID disk or a SSD. Typical database sizes are between 2 GB and 8 GB. The largest tables hold several million entries. Also FTS4 is used, which also creates large tables. Fast internal RAID disks, SDD. Four Xeon cores. 8 GB RAM. I'm using SQLite 3.8.0.2 WAL mode, shared cache enabled. locking_mode=NORMAL checkpoint_fullfsync=0 pragma page_size=4096 pragma cache_size=16384 General (retrieval) performance is excellent! During an ingest phase, my application pumps in hundreds of thousands of records into multiple tables. There are massive amounts of writes during that phase, different record sizes, tables with one to four indices etc. My application is multi-threaded and inserts data into the database concurrently from multiple threads. The threads process data in batches, and use SQLite transactions to process all records of a batch into the database. Transactions gain a lot of speed, which outweighs the side effects of potential blocking. The threads monitor the execution times of the database operations and adjust the batch size to balance speed and transaction lock duration. Slower operations cause smaller batches, which results in shorter database locks and better concurrency. The system adapts fairly well to system performance and data structure. The performance was not that bad, but far from good. For a given set of input data (100,000 "elements"), the execution estimate was about 5 hours. Database on a high-speed SSD. The largest table holds about 5 million entries afterwards. ***With one single change*** I improved the execution time from 5 hours down to about 30 minutes! I changed PRAGMA synchronous=NORMAL to PRAGMA synchronous=OFF Also all other database write operations just 'fly' now. I'm even more impressed with SQLite than before, but I wonder why is the change so _dramatic_ ? >From the docs my impression was that using WAL mode is ideal for bulk inserts. That wrapping large bulks of data into smaller batches, wrapped in BEGIN/COMMT is best for performance etc. That using synchronous=NORMAL limits the file system flush/wait operations certain really important operations.
I logged the execution times of various operations in this phase to a text file. Everything was fast, the processing, the INSERTs etc. But COMMIT operations sometimes took 20s, then 0.2s, then again 10s. That's the time SQLite spends in the execute call with "COMMIT". Of course the amount of data written in each transaction block varied, but in general, 85% of the total execution time of my code was spent in the COMMIT call. My application was the only application with measurable disk I/O at the time. No virus checker etc. Changing to synchronous=OFF made the commits 10 times faster. Is this the expected behavior or am I missing something obvious? _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users