[sqlite] storing coordinate pairs efficiently?

Simon Slavin Wed, 20 Apr 2016 14:41:05 +0100

On 20 Apr 2016, at 2:16pm, Wolfgang Enzinger <sqlite at enzinger.net> wrote:


> CREATE INDEX "pdata_spatial_index" ON "pdata"("gx","gy");
> 
> [snip]
> 
> However, EQP also shows that it is only used with the value of gx, not gy:

It would be used if you searched for one x value and a range of y values.  If 
you do have a range in x then the index is not sorted in a convenient order to 
then search for a range in y.

> OTOH, it doesn't feel right to have no index at all on "gy", so this
> fields got its own index:
> 
> CREATE INDEX "pdata_spatial_index_x" ON "pdata"("gx");
> CREATE INDEX "pdata_spatial_index_y" ON "pdata"("gy");
> 
> With ANALYZE data available,

You must do ANALYZE after you have created the indexes, and after your table 
has a realistic number of realistic rows in it.  If you're doing that, you're 
fine.  And I think that creating two indexes is the right thing to do.

> SQLite is now smart enough to choose the most
> efficient search path, depending on the width / height of the search range:
> 
> SELECT * FROM pdata WHERE gx BETWEEN 210000 AND 220000 and gy BETWEEN 200000
> AND 390000
> --> SEARCH TABLE pdata USING INDEX pdata_spatial_index_x (gx>? AND gx<?)
> 
> SELECT * FROM pdata WHERE gx BETWEEN 200000 AND 300000 and gy BETWEEN 210000
> AND 220000
> --> SEARCH TABLE pdata USING INDEX pdata_spatial_index_y (gy>? AND gy<?)
> 
> So far this seems the most efficient solution to me. Any input on this?

Looks good to me.  But do you really need SELECT * or would SELECT ID be 
sufficient ?  I'll explain why.

SQLite has an optimization for searches.  If it decides to use a certain index 
for the search, and all the columns needed by the SELECT are present in that 
index, then it returns the values taken from the index entry.  It does not need 
to first find the index entry and then consult the table.

So ...
if the speed of these searches is too slow to suit your needs,
and you actually do need both coordinates,
and you don't mind sacrificing speed of data entry
and some disk space,
you may be better off with

CREATE INDEX "pdata_spatial_index_x" ON "pdata"("gx","gy");
CREATE INDEX "pdata_spatial_index_y" ON "pdata"("gy","gx");

(There is no need to add ID to the index since it is your primary key so it is 
always 'invisibly' entered at the end of every index definition.)

This would allow SQLite to pull gx, gy and ID off of whichever of those indexes 
it decided to use, without then having to look the ID up in your table.

However I note that you have "/* a couple more fields here */" in your schema 
so perhaps this would not be worth doing.  Or perhaps elegance and disk space 
are more important to you so you wouldn't want to do it for other reasons.

Simon.

[sqlite] storing coordinate pairs efficiently?

Reply via email to