Indeed the distribution across shards should be transparent. In fact, as a client I should not need to know anything about any shard. But as the current state of Solr (1.4) dictate an interface where you - as a client - must provide a list of shards, then the responsibility has been shiftet over to the client.
Since we get so much data that we must add a new shard per month, we have to be shard-aware on the client side. My understanding of Solr is that the final reponse of a query is only finished when every shard in the querys shard list has been consulted. This mean that the slowest ship defines the speed, so to speak. Or worse - if any shard in the list fails, then the response fails! What I hope to achieve is a way of cutting shards off the list for a query. If I more or less know how many hits a given query have in a shard, then I could control paging myself, and only include shards I know will have the documents in the shardlist for the query. Otherwise I'm afraid of performance when we get to have dusins of shards. So to summerise: We are developing a system where a given search will be performed again and again over time on an ever-increasing document base. The first time a search is done, it will be distributed across every shard in order to get a total from beginning of time till the current timestamp of the querys debute. This total is cached and hereafter maintained by querying the most recent shards from the last date until now. Mostly the documents come in a chronological order, but occasionally they arrive out of order. The shards are organised by date intervals, and this mean that every shard from time to time will be the target of more documents. This will induce a slight discrepency between the cached total and the actual total. But this is a discrepency that we can live with. But I would also like to know how many hits there are in each individual shard. If I know this, then I can tailormake a precise shardlist for the query: Because I know the offset and pagesize of the query, and I know how many documents are in each shard, then I can calculate which shards to include. This is a lot of client side administration - I know, but I quess - I hope - it will performe quite well... Is this idea crazy or what? -- View this message in context: http://lucene.472066.n3.nabble.com/match-count-per-shard-and-across-shards-tp2369627p2382411.html Sent from the Solr - User mailing list archive at Nabble.com.