I kind of think this might be "working as designed", but I'll be happy to be corrected by others :)
We had a similar issue which we discovered by accident, we had 2 or 3 collections spread across some machines, and we accidentally tried to send an indexing request to a node in teh cloud that didn't have a replica of collection1 (but it had other collections). We saw an instant jump in indexing latency to 5s, which given the previous latencies had been ~20ms was rather obvious! Querying seems to be fine with this kind of forwarding approach, but indexing would logically require ZK information (to find the right shard for the destination collection and the leader of that shard), so I'm wondering if a node in the cloud that has a replica of collection1 has that information cached, whereas a node in the (same) cloud that only has a collection2 replica only has collection2 information cached, and has to go to ZK for every "forwarding" request. I haven't checked the code recently, but that seems plausible to me. Would you really want all your collection2 nodes to be running ZK watches for all collection1 updates as well as their own collection2 watches, that would clog them up processing updates that in all honestly, they shouldn't have to deal with. Every node in the cloud would have to have a watch on everything else which if you have a lot of independent collections would be an unnecessary burden on each of them. If you use SolrJ as a client, that would route to a correct node in the cloud (which is what we ended up using through JNI which was "interesting"), but if you are using HTTP to index, that's something your application has to take care of. On 28 October 2014 19:29, Matt Hilt <matt.h...@numerica.us> wrote: > I have three equal machines each running solr cloud (4.8). I have multiple > collections that are replicated but not sharded. I also have document > generation processes running on these nodes which involves querying the > collection ~5 times per document generated. > > Node 1 has a replica of collection A and is running document generation > code that pushes to the HTTP /update/json hander. > Node 2 is the leader of collection A. > Node 3 does not have a replica of node A, but is running document > generation code for collection A. > > The issue I see is that node 1 can push documents into Solr 3-5 times > faster than node 3 when they both talk to the solr instance on their > localhost. If either of them talk directly to the solr instance on node 2, > the performance is excellent (on par with node 1). To me it seems that the > only difference in these cases is the query/put request forwarding. Does > this involve some slow zookeeper communication that should be avoided? Any > other insights? > > Thanks