Re: [sqlite] speeding up a query

P Kishor Sat, 26 Aug 2006 13:09:30 -0700

On 8/26/06, Cory Nelson <[EMAIL PROTECTED]> wrote:

Instead of indexing each column on its own, try making them one big index.



Thanks. Actually making a composite index really helped. The match
rate went from 2.5/sec to more than 82/sec. Very nice. I did the
following --

sqlite> CREATE INDEX ix_polys ON polys (xmin, ymin, xmax, ymax);
sqlite> CREATE INDEX ix_points ON points (x, y);

I am still looking at more than 22 hours of computing (assuming linear
performance), although it will likely be much better on the actual
production machine (my test box is a beat-up Dell laptop with 1.4 Ghz
P3 and 768 Mb ram; the production box is a 4 CPU Xeon box with 4 Gb
ram, although multiple CPUs won't probably help).

On 8/26/06, Joe Wilson <[EMAIL PROTECTED]> wrote:

The sample poly.name's you've provided in your example seem to be integers.
Is that always the case? Are the poly_id's unique? If both of these
are true, you might consider making poly_id your primary key in the
polys table in order to eliminate a column.


Poly_ids are indeed integers and unique, and they are also a primary
key. Although, I don't understand how that allows me to eliminate a
column.


This problem is very difficult to do in pure SQL efficiently.


Actually, I have tried many other approaches, and SQL seems to be a
fairly efficient one. Here is why -- in the worst case, I have to test
each point against each poly (a geometry algorithm allows me to test
if the point is definitely in the poly or not). That is the worst
case, as it is 200k X 5 mill = 1 trillion transactions.

Instead, I am storing the rect boxes of each polys (polys are
irregular, so there could be more than one polys with the same rect).
For each point, this allows me to narrow down the possible polys that
I would have to match. Using a query like

  SELECT name
  FROM polys
  WHERE
    xmin < (SELECT x FROM points WHERE point_id = ?) AND
    ymin < (SELECT y FROM points WHERE point_id = ?) AND
    xmax > (SELECT x FROM points WHERE point_id = ?) AND
    ymax > (SELECT y FROM points WHERE point_id = ?)

if I get a single match, I already know that point is in the matched
poly. On the other hand, if I get more than one matches, I have to run
my geometry algorithm against just those handful of polys.

Anyway, that is my logic.

You might conisder reading in all the polys in memory,
and then doing a single pass over your point data, assigning a name to
each point as you go along. The SQL indexes would be of no use to you
with this algorithm. Instead, you would need a sorted perl array of
poly \references for each of ymin, ymax, xmin, xmax.
The 200K polys and in-memory perl indexes should take no more than
20M of RAM.



Reading all the polys in memory does seem like a good idea, however, I
won't be able to do a "narrowing down" of potential points-in-polys. I
would have to match every point against every poly, and that would be
very slow.

Any further suggestions most welcome.

--- P Kishor <[EMAIL PROTECTED]> wrote:

> Greets,
>
> Using SQLite for Windows 3.3.7. I have the following two tables
> -- bounding box of each poly
> CREATE TABLE polys (
>   poly_id INTEGER PRIMARY KEY,
>   xmin    REAL,
>   ymin    REAL,
>   xmax    REAL,
>   ymax    REAL,
>   name    TEXT
> )
>
> data look like so
>
> 
1|1723885.18957644|282631.95646140|1727224.46537863|287816.54753434|100030117004
> 
2|1716073.04710809|281166.39606662|1725746.97483727|287098.00276086|100030118009
>
> -- coord pair of each point
> CREATE TABLE points (
>   point_id  INTEGER PRIMARY KEY,
>   x         REAL,
>   y         REAL,
>   name      TEXT
> )
>
> data look like so
>
> 1|-2268900.28781180|191367.60670709|
> 2|-2269660.73941476|193426.66511514|
>
> I have built the following indexes
>
> CREATE INDEX ix_polys_xmin ON polys (xmin)
> CREATE INDEX ix_polys_ymin ON polys (ymin)
> CREATE INDEX ix_polys_xmax ON polys (xmax)
> CREATE INDEX ix_polys_ymax ON polys (ymax)
> CREATE INDEX ix_points_x ON points (x)
> CREATE INDEX ix_points_y ON points (y)
>
> Here is what I want to do: I want to SET points.name = polys.name WHERE
>
>   polys.xmin < (SELECT points.x FROM points WHERE point_id = ?) AND
>   polys.ymin < (SELECT points.y FROM points WHERE point_id = ?) AND
>   polys.xmax > (SELECT points.x FROM points WHERE point_id = ?) AND
>   polys.ymax > (SELECT points.y FROM points WHERE point_id = ?)
>
> yields one and only one result. For all other records, I want to run a
> further point-in-poly function which enables me to find an exact
> match. In other words, I minimize the number of times I have to run my
> point-in-poly function by getting SQLite's help in eliminating the
> points I am know to definitely fall inside a specific poly.
>
> So, to test my approach, I set up a loop over the points (I am using
> Perl DBI), and tried the following --
>
>   foreach point_id
>
>     SELECT name
>     FROM polys
>     WHERE
>       xmin < (SELECT x FROM points WHERE point_id = ?) AND
>       ymin < (SELECT y FROM points WHERE point_id = ?) AND
>       xmax > (SELECT x FROM points WHERE point_id = ?) AND
>       ymax > (SELECT y FROM points WHERE point_id = ?)
>
>     If only one polys.name was found, that's it
>
>     elsif more than one polys.name were found
>       run another function to determine exact match
>
> Well, I learned that perhaps this may not be the way to do this. You
> see, I have more than 200k rows in my polys table, and more than 5
> million rows in the points table. I ran the above query last night,
> went to sleep, and this morning it was still churning away (or had
> frozen the 'puter). I had to kill it. Looping over each point is just
> way too slow.
>
> Another approach would be to loop over each poly and narrow my set to
> all the points in it. Then run my function on each point against that
> poly. However, this will also likely take very long -- one, I have to
> still loop over each row, and then, when I find the points, I have
> check each point (as I might have overlapping polys).
>
> Suggestions?
>
> --
> Puneet Kishor http://punkish.eidesis.org/
> Nelson Inst. for Env. Studies, UW-Madison http://www.ies.wisc.edu/
> Open Source Geospatial Foundation https://edu.osgeo.org/
>
> -----------------------------------------------------------------------------
> To unsubscribe, send email to [EMAIL PROTECTED]
> -----------------------------------------------------------------------------
>
>


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------



--
Puneet Kishor http://punkish.eidesis.org/
Nelson Inst. for Env. Studies, UW-Madison http://www.ies.wisc.edu/
Open Source Geospatial Foundation https://edu.osgeo.org/

-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------

Re: [sqlite] speeding up a query

Reply via email to