-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 08/06/2015 09:27 AM, Dan Kennedy wrote: >> Is it using more CPU cycles in mmap mode or just taking longer? >> If the former, does [time] attribute them to "user" or "system"?
It is taking longer. I have 3 XML dumps which I turn into JSON (incrementally) and then denormalise and insert into SQLite across several tables. While all this work is going on, the code prints out statistics about how long it is running and about how many records per second are being processed. The final database size after commit and vacuum is ~8GB. There are a lot of foreign keys too, although all simple (referring to an INTEGER PRIMARY KEY column). I gave mmap a try - ie the *only* change was to add an extra pragma before the beginning of the transaction: "pragma mmap_size="+str(2*1024*1024*1024) In hard numbers, without that line I was doing ~1,118 records per second and with it it does ~300. A normal run takes about 1h20m but the mmap one was still running 3 hours later when I aborted it. (BTW this is all on a tmpfs filesystem on 64 bit Linux with swap spread across two ssds, and 32GB of ram. ie the actual storage hardware isn't a factor. Also single threaded because XML.) >> How large are you letting the wal file grow between checkpoints? Pretty much the entire database size. For the tests I was starting with a deleted database directory (ie no pre-existing files), and then doing these pragmas: "pragma page_size=4096", # "pragma mmap_size="+str(2*1024*1024*1024), "pragma journal_mode=wal", "pragma wal_autocheckpoint=10000", "pragma foreign_keys=on", Then I start a transaction, and do the importing within that transaction. The database file is 4kb during that process, the wal file gets to be about 10GB. If I use gdb to periodically break into the running process in the mmap case, then it was always in sqlite3WalFindFrame. I don't need any help fixing my importing process (eg don't need a journal on an empty database anyway). But it is frustrating that mmap only goes up to a few kb shy of 2GB even for 64 bit, and I have one example (ie anecdote not data) showing that mmap hurts for inserts on > 2GB databases. Perhaps it is worth others testing to see if this > is a systemic problem, or just bad luck for me :-) It may also be relevant that tables add columns over time. I dynamically add them after encountering previously unseen fields in the JSON. However I'd expect the schema to be final a few thousand records in. Most tables have 3 to 6 million records. Roger -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iEYEARECAAYFAlXDmukACgkQmOOfHg372QSVtgCbBihGgIuZqS3Yy2JARXZ1+Q59 GmwAoMG53XxuLNhcMIw1PV46fD/Z/5tT =luXx -----END PGP SIGNATURE-----