I'm pretty new at SQLite, so this may seem obvious to you. Be kind. I'm using Python on Ubuntu Linux 16.04 LTS, and the sqlite that is built into Python. The database is using WAL.
I've got a database of some 100 million records, and a file of just over 300 thousand that I want represented in it. I wanted to check how much difference it was going to make, so I wrote a super simple program to the read the file and count how many records are already there. I got impatient waiting for it so I killed the process and added an output of one dot (".") per 1000 records. It went very fast for what I estimate was around 200 dots and hit a wall. It made progress, but very very slowly. So I killed it again and added a commit() call every time it output a dot. It didn't hit a wall, just some rough road (that is, it slowed down at about the same spot but not nearly so drastically). The code makes to changes to the database at all. Why does commit() make a difference? What else should I learn from this? The field being used for the lookup has an index. ++ kevin Code follows: #!/usr/bin/env python3 """Count the number of records that represent rows in the database 'pos' table. The database is not modified. Last Modified: Sat Nov 25 18:56:49 PST 2017 """ import os.path # https://docs.python.org/3.5/library/os.path.html import sys # https://docs.python.org/3.5/library/sys.html import argparse # https://docs.python.org/3.5/library/argparse.html import sqlite3 # https://docs.python.org/3.5/library/sqlite3.html import re # https://docs.python.org/3.5/library/re.html # from /usr/local/lib/python3.5/dist-packages import qcreate from qerror import * import myparser if __name__ == '__main__': parser = argparse.ArgumentParser(description="""A program to read positions and count how many are in the database""",) parser.add_argument("--dbname", default=None, help="name of the database to work on (overrides qubic.ini file)") parser.add_argument("file", nargs='?', type=argparse.FileType('r'), default=sys.stdin, help="file containing the qsearch results (default stdin)") args=parser.parse_args() infile = args.file if args.dbname is None: here=os.path.split(os.path.realpath('.'))[1] for confdir in ".",os.path.join(os.environ["HOME"],".config"),"/etc/xdg/": f = os.path.join(confdir, "qubic.ini") if os.path.exists(f): args.dbname = myparser.parse(f, here, "dbname") if args.dbname is not None: break if args.dbname is None: print(" *** ERROR: no database name provided and none found in qubic.ini files") sys.exit(1) present = missing = lines = 0 with sqlite3.connect(args.dbname) as conn: for line in infile: fields = line.split() pos = fields[0] if len(pos) != 64: # Important test to catch grep without --no-filename raise InputError(" ERROR: input line has wrong-sized position: " + line) pnum = None for row in conn.execute(""" SELECT pnum FROM pos WHERE ppos=? """,(pos,)): pnum = row[0] break if pnum is None: missing += 1 else: present += 1 lines += 1 if lines % 1000 == 0: print(".",flush=True,end="") conn.commit() print("there were",present,"records on file and",missing," were missing") print("out of a total of", lines, "records.") -- word of the year: *kakistocracy* _______________________________________________ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users