Hi,

I have looked at:
http://blog.jteam.nl/2009/08/03/geo-location-search-with-solr-and-lucene

This looks like it provides a "proper" way which I will try out for sure, but I 
also wanted to compare it against a less "proper" approach.

In an application we are storing offers which can be available in multiple 
stores and we want to make it possible that users can define their current 
position and select a range for how far away matches may be (currently we can 
accept this range to be a rectangle or a circle .. eventually we might want 
polygons).

We are using PostgreSQL which has really nice GIS support and we are planning 
on having Solr provide us only the id's in search results anyways.

So now I see two alternative approaches to the "proper" one mentioned above:

1) We always first fetch the list of store id's from PostgreSQL that fit in the 
range before querying Solr. If users do not change the range coordinates 
between queries we could cache this, but it means that we would have to first 
query PostgreSQL to get the list of stores, place this list into the Solr query 
to fetch the id's and then get the meta data for the given id's out of 
PostgreSQL. Also in extreme cases the list of stores to fetch and place into 
the Solr query could get quite long.

2) Somehow store the multiple x-y-coordinate combinations and filter on them. I 
am not sure if we could have 2 multivalued fields (x-coordinates and 
y-coordinates) for this for example. Since the order is maintained in theory it 
could be possible to then do a rectangle filter (using the upper left hand and 
lower right hand coordinates of the rectangle as filters) to filter in a way 
that would use the first value from the x-coordinates field together with the 
first from the y-coordinates The second from x-coordinates with the second 
y-coordinates. Obviously each offer could have a different number of locations 
and the number of locations could be fairly high too.

So maybe illustrate the comparison lets assume that we just have one location. 
In this scenario we could just have two separate fields x_coord and y_coord and 
the query would look like something like this:
x_coord < x_upper_left_coord AND x_coord > x_lower_right_coord AND y_coord > 
y_upper_left_coord AND y_coord < y_lower_right_coord

Now in order to support multiple values it would need to do he following (which 
would need to be index supported in order to perform decently):
for (i=0, i<count_values(x_coord),i++)
x_coord[i] < x_upper_left_coord AND x_coord[i] > x_lower_right_coord AND 
y_coord[i] > y_upper_left_coord AND y_coord[i] < y_lower_right_coord

regards,
Lukas Kahwe Smith
m...@pooteeweet.org



Reply via email to