solr-user wrote
> 
> Thanks David.  No worries about the delay; am always happy and
> appreciative when someone responds.
> 
> I don't understand what you mean by "All center points get cached into
> memory upon first use in a score" in question 2 about the Java OOM errors
> I am seeing.
> 

The underlying field type receives one internal Shape instance per WKT
string that is handed to it, no matter wether that WKT is MultiGeometry or
not.  The center point of that shape is indexed in such a way that it can be
read into a cache later.  It doesn't matter how many vertexes/coordinates
your geometries have or quantity of shapes that exist in a single WKT
string; it results in one point given one WKT string value.  Just wanted to
be clear on that.  STNumPoints is the wrong statistic since that counts
internal coordinates, from my reading of its documentation just now. 
STNumGeometries isn't right either if your WKT uses any of the Multi* type
geometries.


solr-user wrote
> 
> The Solr instance I have setup for testing has around 200k docs, with one
> WKT field per doc (indexed and stored and set to multivalue).
> 
> I did a count of the number of points that get indexed in Solr (computed
> in MS SQL by counting the number of points (using STNumPoints) for each
> geometry (using STNumGeometries) in the WKT data I am indexing), and I
> have around 35M points total.
> 
> If only the center points for 190K docs get cached, wouldn't that easily
> fit in 7GB of heap? 
> 
> Even if Solr was caching 35M points, that still doesn't sound like 7GB
> worth of data.
> 

Yeah... the memory cache may be pig-ish but not that bad.  There's something
about the implementation that tells me there could be a bug if any of your
polygon shapes are small and/or you index at a high resolution.  Given that
you have multi-valued spatial data per document, you can't simply use
solr.LatLonType.  Try this -- create a new field called centerPoints or
something like that, and also use the same field type as for the geohash one
you are already using.  But for this one, hand Solr the center-points of
your shape data.  Hopefully it's straight-forward for you to calculate this. 
Then when you do sorting by distance or need to retrieve the distance via a
dist:query(...) etc., be sure to use this field and NOT the main shape one
that has the full shape indexed.  To be sure the spatial module doesn't load
the center points for the main shape field, pass needScore=false as a Solr
local-param in your filter query for it.

Hopefully that fixes it.  If it does, there is a bug and I know what it is.

~ David



-----
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/question-s-re-lucene-spatial-toolkit-aka-LSP-aka-spatial4j-tp3997757p4000276.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to