On Sunday, 3 December, 2017 08:24, Richard Rousselot <richard.rousse...@gmail.com> wrote:
>Had similar issue a few years ago; we were using the SQLite3.exe. We >recompiled the exe as 64 bit which allowed it to use more than 4 GB >of ram, loaded the machine with as much memory as we could. Voila, >super fast processing. >Can the Python libraries be made 64 bit some how? Yes. You must be using a 64-bit version of Python and the procedure to replace the sqlite3.dll / sqlite3.so it uses is the same as for the 32-bit version, or to compile and use a 64-bit version of the apsw extension is unchanged. Neither Windows nor Linux can thunk a dynamic load module such that the one used is a different model than the running process (it was proprietary IBM technology that no one else seems smart enough to duplicate), so you have to update Python to the 64-bit model as well. On Windows 10 16299.98 I have both a 32-bit (Python 2.7.14) and 64-bit (Python 3.6.4) installed and build 32-bit DLLs for the former and 64-bit for the latter from the same source (just selecting -m32 or -m64 as appropriate). I use the MinGW64/GCC compiler because (a) it can compile in either model depending on the switch you use without requiring any code changes, (b) supports long long and long double in 32-bit; and, (c) does not require the use of the Microsoft C Runtime "moving target" libraries -- it can compile to the subsystem runtime (MSVCRT) that has been stable since, oh, the first OS/2 New Technology (which later became Windows NT) way back when. Oh, and MinGW/GCC does "true" position independent code and when you do a static link of a module to either and executable or dynamic load library, it is truly static with no strange external dependencies. Since SQLite3 is heavily I/O bound (or at least syscall/kernel call bound for mutexes, etc) in practically everything it does, the 64-bit version is much faster (about 20%) than the 32-bit version, when running on a 64-bit OS, since the OS does not have to thunk the call stack when accessing/returning from the kernel. >On Thu, Nov 30, 2017 at 7:01 PM Keith Medcalf <kmedc...@dessus.com> >wrote: > >> >> Is there an index on pos where ppos is the left-most field (or the >only >> field) in the index? >> What is the column affinity of ppos? Of the fiold you are passing >as a >> parameter? >> Is ppos unique? >> >> If you CREATE [UNIQUE] INDEX searchindex ON pos (ppos, pnum); >> >> then your query can be satisfied only using the searchindex >covering index. >> >> If there is not an index on ppos, then you will be wasting time >recreating >> the index for each query. >> >> You will probably need to increase the cache size beyond the paltry >> default in order for the entire btree structures to be cached in >RAM -- you >> probably want to make it as big as you can. >> >> --- >> The fact that there's a Highway to Hell but only a Stairway to >Heaven says >> a lot about anticipated traffic volume. >> >> >> >-----Original Message----- >> >From: sqlite-users [mailto:sqlite-users- >> >boun...@mailinglists.sqlite.org] On Behalf Of Kevin O'Gorman >> >Sent: Saturday, 25 November, 2017 20:14 >> >To: sqlite-users >> >Subject: [sqlite] Simple read-only program very slow >> > >> >I'm pretty new at SQLite, so this may seem obvious to you. Be >kind. >> >I'm using Python on Ubuntu Linux 16.04 LTS, and the sqlite that is >> >built >> >into Python. The database >> >is using WAL. >> > >> >I've got a database of some 100 million records, and a file of >just >> >over >> >300 thousand that I want represented in it. I wanted to check how >> >much >> >difference it was going to make, so I wrote a super >> >simple program to the read the file and count how many records are >> >already >> >there. I got impatient waiting for it so I killed the process and >> >added an >> >output of one dot (".") per 1000 records. It went very fast for >what >> >I >> >estimate was around 200 dots and hit a wall. It made progress, >but >> >very >> >very slowly. >> > >> >So I killed it again and added a commit() call every time it >output a >> >dot. >> >It didn't hit a wall, just some rough road (that is, it slowed >down >> >at >> >about the same spot but not nearly so drastically). >> > >> >The code makes to changes to the database at all. Why does >commit() >> >make a >> >difference? What else should I learn from this? >> > >> >The field being used for the lookup has an index. >> > >> >++ kevin >> > >> >Code follows: >> >#!/usr/bin/env python3 >> >"""Count the number of records that represent rows in the database >> >'pos' >> >table. >> >The database is not modified. >> > >> > Last Modified: Sat Nov 25 18:56:49 PST 2017 >> >""" >> > >> >import os.path # >> >https://docs.python.org/3.5/library/os.path.html >> >import sys # >> >https://docs.python.org/3.5/library/sys.html >> >import argparse # >> >https://docs.python.org/3.5/library/argparse.html >> >import sqlite3 # >> >https://docs.python.org/3.5/library/sqlite3.html >> >import re # >https://docs.python.org/3.5/library/re.html >> > >> ># from /usr/local/lib/python3.5/dist-packages >> >import qcreate >> >from qerror import * >> >import myparser >> > >> >if __name__ == '__main__': >> > parser = argparse.ArgumentParser(description="""A program to >read >> >positions and count how many are >> > in the database""",) >> > parser.add_argument("--dbname", default=None, >> > help="name of the database to work on (overrides >> >qubic.ini >> >file)") >> > parser.add_argument("file", nargs='?', >> >type=argparse.FileType('r'), >> >default=sys.stdin, >> > help="file containing the qsearch results (default >> >stdin)") >> > args=parser.parse_args() >> > infile = args.file >> > >> > if args.dbname is None: >> > here=os.path.split(os.path.realpath('.'))[1] >> > for confdir in >> >".",os.path.join(os.environ["HOME"],".config"),"/etc/xdg/": >> > f = os.path.join(confdir, "qubic.ini") >> > if os.path.exists(f): >> > args.dbname = myparser.parse(f, here, "dbname") >> > if args.dbname is not None: >> > break >> > if args.dbname is None: >> > print(" *** ERROR: no database name provided and none >> >found in >> >qubic.ini files") >> > sys.exit(1) >> > >> > present = missing = lines = 0 >> > with sqlite3.connect(args.dbname) as conn: >> > for line in infile: >> > fields = line.split() >> > pos = fields[0] >> > if len(pos) != 64: # Important test to catch grep >without >> >--no-filename >> > raise InputError(" ERROR: input line has wrong- >sized >> >position: " + line) >> > >> > pnum = None >> > for row in conn.execute(""" >> > SELECT pnum >> > FROM pos >> > WHERE ppos=? >> > """,(pos,)): >> > pnum = row[0] >> > break >> > if pnum is None: >> > missing += 1 >> > else: >> > present += 1 >> > lines += 1 >> > if lines % 1000 == 0: >> > print(".",flush=True,end="") >> > conn.commit() >> > print("there were",present,"records on file and",missing," >were >> >missing") >> > print("out of a total of", lines, "records.") >> > >> > >> > >> >-- >> >word of the year: *kakistocracy* >> >_______________________________________________ >> >sqlite-users mailing list >> >sqlite-users@mailinglists.sqlite.org >> >http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite- >users >> >> >> >> _______________________________________________ >> sqlite-users mailing list >> sqlite-users@mailinglists.sqlite.org >> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite- >users >> >_______________________________________________ >sqlite-users mailing list >sqlite-users@mailinglists.sqlite.org >http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users _______________________________________________ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users