On Fri, Jun 7, 2013, at 02:59 PM, Jack Krupansky wrote: > AFAICT, SolrCloud addresses the use case of distributed update for a > relatively smaller number of collections (dozens?) that have a relatively > larger number of rows - billions over a modest to moderate number of > nodes > (a handful to a dozen or dozens). So, maybe dozens of collections (some > people still call these "cores") that distribute hundreds of millions if > not > billions of rows over dozens (or potentially low hundreds) of nodes. > Technically, ZK was designed for thousands of nodes, but I don't think > that > was for the use case of distributed query that constantly fans out to all > shards.
Not sure I get what you're saying here. ZK was designed for thousands of nodes, and the way it works is by making sure that each node has an active cache of all relevant data within it so they don't need to poll ZK for the data. Therefore, as far as ZK is concerned it is irrelevant how many hosts are involved in any particular transaction - the node that is handling the distribution consults its cache of the list of active nodes, decides which one to hit, and off it goes, no interaction with ZK required. Or am I missing something? Upayavira