samabhiK wrote
> 
> David,
> 
> Thanks for such a detailed response. The data volume I mentioned is the
> total set of records we have - but we would never ever need to search the
> entire base in one query; we would divide the data by region or zip code.
> So, in that case I assume that for a single region, we would not have more
> than 200M records (this is real , we have a region with that many
> records).
> 
> So, I can assume that I can create shards based on regions and the
> requests would get distributed among these region servers, right?
> 

The fact that your searches are always per region (or almost always) helps
things a lot.  Instead of doing a distributed search to all shards, you
would search the specific shard, or worst case 2 shards, and not burden the
other shards with queries you no won't be satisfied.  This new information
suggests that the total 10k queries per second volume would be divided
amongst your shards, so 10k / 40 shards = 250 queries per second.  Now we
are approaching something reasonable.  If any of your regions need to scale
up (more query volume) or out (big region) then you can do that on a case by
case basis.  I can think of ways to optimize that for spatial.

Thinking in terms of pure queries per second on a machine, say a 16 CPU
core/machine one, then 250/16 = ~ 16 queries per second per CPU core of a
shard.  I think that's plausible but you would really need to determine how
many exactly you could do.  I assume the spatial index is going to fit in
RAM.  If successful, this means ~40 machines (one per region). 



>  You also mentioned about ~20 concurrent queries per shard - do you have
> links to some benchmarks? I am very interested to know about the hardware
> sizing details for such a setup.
> 

The best I can offer is on the geospatial side: 
https://issues.apache.org/jira/browse/SOLR-2155?focusedCommentId=12988316&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12988316

But this was an index of "only" 2M distinct points.  It may be that these
figures still hold if the overhead of the spatial query with data is so low
that other constant elements comprise the times, but I really don't know. 
To be clear, this is older code that is not the same as the latest, but they
are algorithmically the same.  The current code has an error epsilon to the
query shape which helps scale further.  There is plenty more optimization
that could be done, like a more efficient binary grid scheme, using Hilbert
Curves, and using an optimizer to find the hotspots to try and optimize
them.



> About setting up Solr for a single shard, I think I will go by your
> advice.  Will see how much a single shard can handle in a decent machine
> :)
> 
> The reason why I came up with that figure was, I have a user base of 500k
> and theres a lot of activity which would happen on the map - every time
> someone moves the tiles, zooms in/out, scrolls, we are going to send a
> server side request to fetch some data ( I agree we can benefit much using
> caching but I believe Solr itself has its own local cache). I might be a
> bit unrealistic with my 10K rps projections but I have read about 9K rps
> to map servers from some sources on the internet. 
> 
> And, NO, I don't work for Google :) But who knows we might be building
> something that can get so much traffic to us in a while. :D
> 
> BTW, my question still remains - can we do search on polygonal areas on
> the map? If so, do you have any link where i can get more details?
> Bounding Box thing wont work for me I guess :(
> 
> Sam
> 

Polygons are supported; I've been doing them for years now.  But it requires
some extensions.  Today, you need the latest Solr trunk, and you need to
apply the Solr adapters to Lucene 4 spatial SOLR-3304, and you need to have
the JTS jar on your classpath, something you download separately.  BTW here
are some basic
docs:http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4  



-----
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Spatial-Search-for-Specif-Areas-on-Map-tp3995051p3995333.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to