On Sun, Nov 26, 2017 at 12:02 AM, Clemens Ladisch <clem...@ladisch.de>
wrote:

> Kevin O'Gorman wrote:
> > I wrote a super simple program to the read the file and count how many
> > records are already there.  I got impatient waiting for it so I killed
> > the process and added an output of one dot (".") per 1000 records.  It
> > went very fast for what I estimate was around 200 dots and hit a wall.
> > It made progress, but very very slowly.  [...]
> > The field being used for the lookup has an index.
>
> I'd guess that most records are found, and that the file and the table
> happen to be somewhat sorted.  The search becomes slow when the amount
> of data that needs to be read exceeds the available memory.
>
> > Why does commit() make a difference?
>
> Hmmm ... interesting.
>
> >             for row in conn.execute("""
> >                         SELECT pnum
> >                         FROM pos
> >                         WHERE ppos=?
> >                     """,(pos,)):
> >                 pnum = row[0]
> >                 break
> >             if pnum is None:
> >                 missing += 1
> >             else:
> >                 present += 1
>
> Even with the index on ppos, the DB still has to look up the table row
> to read the pnum value.
>
> You do not care about the actual pnum value, so you could replace it
> with a constant value ("SELECT 1 FROM ...").  Or just use EXISTS to
> show what you actually want to do:
>
>         cursor = conn.execute("""
>                 SELECT EXISTS (
>                         SELECT *
>                         FROM pos
>                         WHERE ppos = ?)
>                 """, (pos,))
>         exists = cursor.fetchone()[0]
>         if exists:
>             present += 1
>         else:
>             missing += 1
> <http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users>


That's real interesting and I'll keep it in mind for the future, but note
that my actual code DOES care about the contents of pmain, so it has to
look in the database anyway.  I'm still left wondering why it ran so slow
and why a commit() helped a read-only program running alone on the machine.

I'm gonna try this on my Xeon with 256 GB of RAM to check out the idea it
was running out of space.  It is true that the database is 50 GB so of
course it's bigger than the RAM on the usual desktop.


-- 
Dictionary.com's word of the year: *complicit*
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to