Thanks for the info Daniel. I will go forth and make a better client.
On Oct 29, 2014, at 2:28 AM, Daniel Collins <danwcoll...@gmail.com> wrote: > I kind of think this might be "working as designed", but I'll be happy to > be corrected by others :) > > We had a similar issue which we discovered by accident, we had 2 or 3 > collections spread across some machines, and we accidentally tried to send > an indexing request to a node in teh cloud that didn't have a replica of > collection1 (but it had other collections). We saw an instant jump in > indexing latency to 5s, which given the previous latencies had been ~20ms > was rather obvious! > > Querying seems to be fine with this kind of forwarding approach, but > indexing would logically require ZK information (to find the right shard > for the destination collection and the leader of that shard), so I'm > wondering if a node in the cloud that has a replica of collection1 has that > information cached, whereas a node in the (same) cloud that only has a > collection2 replica only has collection2 information cached, and has to go > to ZK for every "forwarding" request. > > I haven't checked the code recently, but that seems plausible to me. Would > you really want all your collection2 nodes to be running ZK watches for all > collection1 updates as well as their own collection2 watches, that would > clog them up processing updates that in all honestly, they shouldn't have > to deal with. Every node in the cloud would have to have a watch on > everything else which if you have a lot of independent collections would be > an unnecessary burden on each of them. > > If you use SolrJ as a client, that would route to a correct node in the > cloud (which is what we ended up using through JNI which was > "interesting"), but if you are using HTTP to index, that's something your > application has to take care of. > > On 28 October 2014 19:29, Matt Hilt <matt.h...@numerica.us> wrote: > >> I have three equal machines each running solr cloud (4.8). I have multiple >> collections that are replicated but not sharded. I also have document >> generation processes running on these nodes which involves querying the >> collection ~5 times per document generated. >> >> Node 1 has a replica of collection A and is running document generation >> code that pushes to the HTTP /update/json hander. >> Node 2 is the leader of collection A. >> Node 3 does not have a replica of node A, but is running document >> generation code for collection A. >> >> The issue I see is that node 1 can push documents into Solr 3-5 times >> faster than node 3 when they both talk to the solr instance on their >> localhost. If either of them talk directly to the solr instance on node 2, >> the performance is excellent (on par with node 1). To me it seems that the >> only difference in these cases is the query/put request forwarding. Does >> this involve some slow zookeeper communication that should be avoided? Any >> other insights? >> >> Thanks
smime.p7s
Description: S/MIME cryptographic signature