-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 08/06/2015 09:27 AM, Dan Kennedy wrote:
>> Is it using more CPU cycles in mmap mode or just taking longer?
>> If the former, does [time] attribute them to "user" or "system"?

It is taking longer.  I have 3 XML dumps which I turn into JSON
(incrementally) and then denormalise and insert into SQLite across
several tables.  While all this work is going on, the code prints out
statistics about how long it is running and about how many records per
second are being processed.  The final database size after commit and
vacuum is ~8GB.  There are a lot of foreign keys too, although all
simple (referring to an INTEGER PRIMARY KEY column).

I gave mmap a try - ie the *only* change was to add an extra pragma
before the beginning of the transaction:

   "pragma mmap_size="+str(2*1024*1024*1024)

In hard numbers, without that line I was doing ~1,118 records per
second and with it it does ~300.  A normal run takes about 1h20m but
the mmap one was still running 3 hours later when I aborted it.

(BTW this is all on a tmpfs filesystem on 64 bit Linux with swap
spread across two ssds, and 32GB of ram.  ie the actual storage
hardware isn't a factor.  Also single threaded because XML.)

>> How large are you letting the wal file grow between checkpoints?

Pretty much the entire database size.  For the tests I was starting
with a deleted database directory (ie no pre-existing files), and then
doing these pragmas:

          "pragma page_size=4096",
#         "pragma mmap_size="+str(2*1024*1024*1024),
          "pragma journal_mode=wal",
          "pragma wal_autocheckpoint=10000",
          "pragma foreign_keys=on",

Then I start a transaction, and do the importing within that
transaction.  The database file is 4kb during that process, the wal
file gets to be about 10GB.

If I use gdb to periodically break into the running process in the
mmap case, then it was always in sqlite3WalFindFrame.

I don't need any help fixing my importing process (eg don't need a
journal on an empty database anyway).  But it is frustrating that mmap
only goes up to a few kb shy of 2GB even for 64 bit, and I have one
example (ie anecdote not data) showing that mmap hurts for inserts on
> 2GB databases.  Perhaps it is worth others testing to see if this
> is
a systemic problem, or just bad luck for me :-)

It may also be relevant that tables add columns over time.  I
dynamically add them after encountering previously unseen fields in
the JSON.  However I'd expect the schema to be final a few thousand
records in.  Most tables have 3 to 6 million records.

Roger
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iEYEARECAAYFAlXDmukACgkQmOOfHg372QSVtgCbBihGgIuZqS3Yy2JARXZ1+Q59
GmwAoMG53XxuLNhcMIw1PV46fD/Z/5tT
=luXx
-----END PGP SIGNATURE-----

Reply via email to