I'm pretty new at SQLite, so this may seem obvious to you.  Be kind.
I'm using Python on Ubuntu Linux 16.04 LTS, and the sqlite that is built
into Python.  The database
is using WAL.

I've got a database of some 100 million records, and a file of just over
300 thousand that I want represented in it.  I wanted to check how much
difference it was going to make, so I wrote a super
simple program to the read the file and count how many records are already
there.  I got impatient waiting for it so I killed the process and added an
output of one dot (".") per 1000 records.  It went very fast for what I
estimate was around 200 dots and hit a wall.  It made progress, but very
very slowly.

So I killed it again and added a commit() call every time it output a dot.
It didn't hit a wall, just some rough road (that is, it slowed down at
about the same spot but not nearly so drastically).

The code makes to changes to the database at all.  Why does commit() make a
difference?  What else should I learn from this?

The field being used for the lookup has an index.

++ kevin

Code follows:
#!/usr/bin/env python3
"""Count the number of records that represent rows in the database 'pos'
table.
The database is not modified.

 Last Modified: Sat Nov 25 18:56:49 PST 2017
"""

import os.path          # https://docs.python.org/3.5/library/os.path.html
import sys              # https://docs.python.org/3.5/library/sys.html
import argparse         # https://docs.python.org/3.5/library/argparse.html
import sqlite3          # https://docs.python.org/3.5/library/sqlite3.html
import re               # https://docs.python.org/3.5/library/re.html

# from /usr/local/lib/python3.5/dist-packages
import qcreate
from qerror import *
import myparser

if __name__ == '__main__':
    parser = argparse.ArgumentParser(description="""A program to read
positions and count how many are
            in the database""",)
    parser.add_argument("--dbname", default=None,
            help="name of the database to work on (overrides qubic.ini
file)")
    parser.add_argument("file", nargs='?', type=argparse.FileType('r'),
default=sys.stdin,
                help="file containing the qsearch results (default stdin)")
    args=parser.parse_args()
    infile = args.file

    if args.dbname is None:
        here=os.path.split(os.path.realpath('.'))[1]
        for confdir in
".",os.path.join(os.environ["HOME"],".config"),"/etc/xdg/":
            f = os.path.join(confdir, "qubic.ini")
            if os.path.exists(f):
                args.dbname = myparser.parse(f, here, "dbname")
                if args.dbname is not None:
                    break
        if args.dbname is None:
            print(" *** ERROR: no database name provided and none found in
qubic.ini files")
            sys.exit(1)

    present = missing = lines = 0
    with sqlite3.connect(args.dbname) as conn:
        for line in infile:
            fields = line.split()
            pos = fields[0]
            if len(pos) != 64: # Important test to catch grep without
--no-filename
                raise InputError(" ERROR: input line has wrong-sized
position: " + line)

            pnum = None
            for row in conn.execute("""
                        SELECT pnum
                        FROM pos
                        WHERE ppos=?
                    """,(pos,)):
                pnum = row[0]
                break
            if pnum is None:
                missing += 1
            else:
                present += 1
            lines += 1
            if lines % 1000 == 0:
                print(".",flush=True,end="")
                conn.commit()
    print("there were",present,"records on file and",missing," were
missing")
    print("out of a total of", lines, "records.")



-- 
word of the year: *kakistocracy*
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to