David,

Firstly, thanks for putting together such a thorough email it helps a lot to 
understand some of the things we were just guessing at because (as you 
mentioned a few times) the documentation around all of this is rather sparse.

I’ll explain the context around the use case we’re trying to solve and then 
attempt to respond as best I can to each of your points.  What we have is a 
list of documents that in our case the location is sometimes a point and 
sometimes a circle.  These basically represent (in our case) inventory at a 
physical location (point) or inventory that can be delivered to you within X km 
(configurable per document) which represents the circle use case.  We want to 
be able to allow a user to say I want all documents within X distance of my 
location, but also all documents that are able to be delivered to your point 
where the delivery distance is defined on the inventory (creating the circle).

This is why we were actually trying to combine both point based data and 
poly/circle data into a single geospatial field, since I don’t believe you 
could do something like fq=geofilt(latlng, x, y, d) OR geofilt(latlngCircle, x, 
y, 1) but perhaps we’re just not getting quite the right syntax, etc.

* Personally, I find it highly confusing to have a field named "latlng" and 
have it be anything other than a simple point -- it's all you have if given a 
single latitude longitude pair.  If you intend for the data to be a circle 
(either exactly or approximated) then perhaps call it latLngCircle

        - This is happening because we’re trying to combine two different use 
cases into a single field, since I don’t think we have that option from the 
query side.  The name is really just us re-using our current field for this 
exploration, but would probably end up being named something different.

* geodist() and for that matter any other attempt to get the distance to a 
non-point shape is not going to work -- either error or confusing results; I 
forget.  This is hard to do and the logic isn't there for it, and probably 
wouldn't perform to user's expectations if it did.  This ought to be documented 
but seems not to be.

        -Good to know, so no matter what we’ll have to have a point value 
stored somewhere for each document and calculate geodist on that.

* Generally RptWithGeometrySpatialField should be used over 
SpatialRecursivePrefixTreeFieldType unless you want heatmaps or are willing to 
make trade-offs in higher index size and lossy precision in order to get faster 
search.  It's up to you; if you benchmark both I'd love to hear how it went.

        -We may explore both but typically we’re more interested in speed than 
accuracy, benchmarking it may be a very interesting exercise however.  For 
sorting for instance we’re actually using sqedist instead of geodist because 
we’re not overly concerned about sorting accuracy.

* In WKT format, the ordinate order is "X Y" (thus longitude then latitude).  
Looking at your triangle, it is extremely close to Antarctica, and I'm 
skeptical you intended that. This is not directly documented AFAICT but it's 
such a common mistake that it ought to be called out in the docs.

        -Definitely did not intend it to be close to Antarctica,  I think we 
tried both but probably went back to lat,long and was definitely more common in 
our (failed) testing.


* I see you are using Geo3D, which is not the default.  Geo3D is strict about 
the coordinate order -- counter-clickwise.  Your triangle is clockwise and thus 
it has an inverted interpretation -- thus it's a shape that covers nearly the 
whole globe.  I recently documented this 
https://issues.apache.org/jira/browse/SOLR-13467 but it's not published yet 
since it's so new.

        - Thanks for this clarification as well.  I had read this in the WKT 
docs too, again something we tried but really weren’t sure about what the right 
answer was and had been going back and forth on.  The documentation seems to 
specify that you need to specify either JTS or Geo3d, but doesn’t provide much 
info/guidance about which to use when and since JTS required adding another jar 
manually and therefore complicates our build process significantly (at least vs 
using Geo3D) we tried Geo3D.  I’d love to hear more about the tradeoffs and 
other considerations between the two, but sounds like we should switch to JTS 
(the default, correct?)


* You can absolutely index a circle in Solr -- this is something cool and 
somewhat unique. And you don't need format=legacy.  The documentation needs to 
call this out better, though it at least refers to circles as a "buffered 
point" which is the currently supported way of representing it, and it does 
have one example.  Search for "BUFFER" and you'll see a WKT-like syntax to do 
it.  BUFFER is not standard WKT; it was added on to do this.  The first arg is 
a X Y center, and 2nd arg is a distance in decimal degrees (not km).  BTW Geo3D 
is a good choice here but not essential either.

-       This sounds very promising and we’ll definitely spend some time here 
because it may ultimately be what we really want to use, sounds like Geo3D may 
actually be the right choice now?

One other question I have is what the behavior will be if both my point and my 
search radius are inside of the circle/polygon entirely?  Like geofilt(x,y,10) 
and a buffered point (in km instead decimal degrees for simplicity) of BUFFER(x 
y 20km).  Will this document return even though my filter is entirely inside 
the polygon, or is it looking for edge intersections?

Thanks so much for the response/help!!!


-Marshall Sanders



On 7/26/19, 12:01 AM, "David Smiley" <david.w.smi...@gmail.com> wrote:

    Hello Marshall,
    
    I worked on a lot of this functionality.  I have lots to say:
    
    * Personally, I find it highly confusing to have a field named "latlng" and
    have it be anything other than a simple point -- it's all you have if given
    a single latitude longitude pair.  If you intend for the data to be a
    circle (either exactly or approximated) then perhaps call it latLngCircle
    * geodist() and for that matter any other attempt to get the distance to a
    non-point shape is not going to work -- either error or confusing results;
    I forget.  This is hard to do and the logic isn't there for it, and
    probably wouldn't perform to user's expectations if it did.  This ought to
    be documented but seems not to be.
    * Generally RptWithGeometrySpatialField should be used
    over SpatialRecursivePrefixTreeFieldType unless you want heatmaps or are
    willing to make trade-offs in higher index size and lossy precision in
    order to get faster search.  It's up to you; if you benchmark both I'd love
    to hear how it went.
    * In WKT format, the ordinate order is "X Y" (thus longitude then
    latitude).  Looking at your triangle, it is extremely close to Antarctica,
    and I'm skeptical you intended that. This is not directly documented AFAICT
    but it's such a common mistake that it ought to be called out in the docs.
    * I see you are using Geo3D, which is not the default.  Geo3D is strict
    about the coordinate order -- counter-clickwise.  Your triangle is
    clockwise and thus it has an inverted interpretation -- thus it's a shape
    that covers nearly the whole globe.  I recently documented this
    
https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_SOLR-2D13467&d=DwIFaQ&c=hrETxhO8sRCXAcJITi-bu62jJ43QQVS6-BatTNT-3bs&r=3lL1Fjs6t-l8MLo9jYFBo_kY10nHjYnyV94cayAiWXc&m=qODo0eqhR9JwDbcVfadRxP0k6Pc2jQizOaf9CBky6Ow&s=gRY9FeOVxx3Oe1nLMsc4d2ATvX81qtF0UmuyCRI2fRc&e=
  but it's not published yet
    since it's so new.
    * You can absolutely index a circle in Solr -- this is something cool and
    somewhat unique. And you don't need format=legacy.  The documentation needs
    to call this out better, though it at least refers to circles as a
    "buffered point" which is the currently supported way of representing it,
    and it does have one example.  Search for "BUFFER" and you'll see a
    WKT-like syntax to do it.  BUFFER is not standard WKT; it was added on to
    do this.  The first arg is a X Y center, and 2nd arg is a distance in
    decimal degrees (not km).  BTW Geo3D is a good choice here but not
    essential either.
    
    Back to your core requirement -- you want to index circles and sort results
    by distance.  Can you please elaborate better on this... distance to the
    outer ring of the circle or the center point?  Center point is easy to do
    simply by putting the center point additionally in a field using
    LatLonPointSpatialField and use geodist referring to that.  Also,
    
    FYI geodist() is a function that can take arguments directly which makes
    more sense when multiple spatial fields are in play.  Sadly this aspect is
    not documented.  Suffice it to say, if you do geodist(latLng) (maybe
    quoted?) then it'll use that field, and parse "pt" param from the request.
    
    ~ David Smiley
    Apache Lucene/Solr Search Developer
    
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.linkedin.com_in_davidwsmiley&d=DwIFaQ&c=hrETxhO8sRCXAcJITi-bu62jJ43QQVS6-BatTNT-3bs&r=3lL1Fjs6t-l8MLo9jYFBo_kY10nHjYnyV94cayAiWXc&m=qODo0eqhR9JwDbcVfadRxP0k6Pc2jQizOaf9CBky6Ow&s=naIBT_8ZUFFX6dx4Oxe3SqU-K5xw51R2C2dsoalFcDY&e=
 
    
    
    On Tue, Jul 23, 2019 at 2:32 PM Sanders, Marshall (CAI - Atlanta) <
    marshall.sande...@coxautoinc.com> wrote:
    
    > We’re trying to index a polygon into solr and then filter/calculate
    > geodist on the polygon (ideally we actually want a circle, but it looks
    > like that’s not really supported officially by wkt/geojson and instead you
    > have to switch format=”legacy” which seems like something that might be
    > removed in the future so don’t want to rely on it).
    >
    > Here’s the info from schema:
    > <field name="latlng" type="location_rpt" indexed="true" stored="true"
    > multiValued="true"/>
    >
    > <fieldType name="location_rpt"
    > class="solr.SpatialRecursivePrefixTreeFieldType"
    >                    geo="true" distErrPct="0.025" maxDistErr="0.000009"
    > distanceUnits="kilometers"
    >                     spatialContextFactory="Geo3D"/>
    >
    >
    > We’ve tried indexing some different data, but to keep it as simple as
    > possible we started with a triangle (will eventually add more points to
    > approximate a circle).  Here’s an example document that we’ve added just
    > for testing:
    >
    > {
    > "latlng": ["POLYGON((33.7942704 -84.4412613, 33.7100611 -84.4028091,
    > 33.7802888 -84.3279648, 33.7942704 -84.4412613))"],
    > "ID": "284598223"
    > }
    >
    >
    > However, it seems like filtering/distance calculations aren’t working (at
    > least not the way we are used to doing it for points).  Here’s an example
    > query where the pt is several hundred kilometers away from the polygon, 
yet
    > the document still returns.  Also, it seems that regardless of origin 
point
    > or polygon location the calculated geodist is always 20015.115
    >
    > Example query:
    >
    > 
select?d=1&fl=ID,latlng,geodist()&fq=%7B!geofilt%7D&indent=on&pt=33.9798087,-94.3286133&q=*:*&sfield=latlng&wt=json
    >
    > Example documents coming back anyway:
    > "docs": [
    > {
    > "latlng": ["POLYGON((33.7942704 -84.4412613, 33.7100611 -84.4028091,
    > 33.7802888 -84.3279648, 33.7942704 -84.4412613))"],
    > "ID": "284598223",
    > "geodist()": 20015.115
    > },
    > {
    > "latlng": ["POLYGON((33.7942704 -84.4412613, 33.7100611 -84.4028091,
    > 33.7802888 -84.3279648, 33.7942704 -84.4412613))"],
    > "ID": "284600596",
    > "geodist()": 20015.115
    > }
    > ]
    >
    >
    > Anyone who has experience in this area can you point us in the right
    > direction about what we’re doing incorrectly with either how we are
    > indexing the data and/or how we are querying against the polygons.
    >
    > Thank you,
    >
    >
    > --
    > Marshall Sanders
    > Principal Software Engineer
    > Autotrader.com
    > marshall.sande...@coxautoinc.com<mailto:marshall.sande...@coxautoinc.com>
    >
    >
    >
    

Reply via email to