samabhiK wrote > > David, > > Thanks for such a detailed response. The data volume I mentioned is the > total set of records we have - but we would never ever need to search the > entire base in one query; we would divide the data by region or zip code. > So, in that case I assume that for a single region, we would not have more > than 200M records (this is real , we have a region with that many > records). > > So, I can assume that I can create shards based on regions and the > requests would get distributed among these region servers, right? >
The fact that your searches are always per region (or almost always) helps things a lot. Instead of doing a distributed search to all shards, you would search the specific shard, or worst case 2 shards, and not burden the other shards with queries you no won't be satisfied. This new information suggests that the total 10k queries per second volume would be divided amongst your shards, so 10k / 40 shards = 250 queries per second. Now we are approaching something reasonable. If any of your regions need to scale up (more query volume) or out (big region) then you can do that on a case by case basis. I can think of ways to optimize that for spatial. Thinking in terms of pure queries per second on a machine, say a 16 CPU core/machine one, then 250/16 = ~ 16 queries per second per CPU core of a shard. I think that's plausible but you would really need to determine how many exactly you could do. I assume the spatial index is going to fit in RAM. If successful, this means ~40 machines (one per region). > You also mentioned about ~20 concurrent queries per shard - do you have > links to some benchmarks? I am very interested to know about the hardware > sizing details for such a setup. > The best I can offer is on the geospatial side: https://issues.apache.org/jira/browse/SOLR-2155?focusedCommentId=12988316&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12988316 But this was an index of "only" 2M distinct points. It may be that these figures still hold if the overhead of the spatial query with data is so low that other constant elements comprise the times, but I really don't know. To be clear, this is older code that is not the same as the latest, but they are algorithmically the same. The current code has an error epsilon to the query shape which helps scale further. There is plenty more optimization that could be done, like a more efficient binary grid scheme, using Hilbert Curves, and using an optimizer to find the hotspots to try and optimize them. > About setting up Solr for a single shard, I think I will go by your > advice. Will see how much a single shard can handle in a decent machine > :) > > The reason why I came up with that figure was, I have a user base of 500k > and theres a lot of activity which would happen on the map - every time > someone moves the tiles, zooms in/out, scrolls, we are going to send a > server side request to fetch some data ( I agree we can benefit much using > caching but I believe Solr itself has its own local cache). I might be a > bit unrealistic with my 10K rps projections but I have read about 9K rps > to map servers from some sources on the internet. > > And, NO, I don't work for Google :) But who knows we might be building > something that can get so much traffic to us in a while. :D > > BTW, my question still remains - can we do search on polygonal areas on > the map? If so, do you have any link where i can get more details? > Bounding Box thing wont work for me I guess :( > > Sam > Polygons are supported; I've been doing them for years now. But it requires some extensions. Today, you need the latest Solr trunk, and you need to apply the Solr adapters to Lucene 4 spatial SOLR-3304, and you need to have the JTS jar on your classpath, something you download separately. BTW here are some basic docs:http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4 ----- Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Spatial-Search-for-Specif-Areas-on-Map-tp3995051p3995333.html Sent from the Solr - User mailing list archive at Nabble.com.