Wow, thanks. So assuming I have a five node ensemble and one machine is rolling along as leader, am I correct to assume that as a leader becomes taxed it can lose the election and another takes over as leader? The leader actually floats about the ensemble under load? I was thinking the leader was merely for referential integrity and things stayed that way until a physical failure.
This would all seem important when building indexes. I think I need to set up a sniffer. Identifying the node with a hash id seems very cool. If my app makes the call to the server with the appropriate shard, then there might only be messaging on the Zookeeper network. Is this a correct assumption? Is my terminology cross threaded? Oh well, time to build my first cluster. I wrote all my clients with single shard collections on a stand alone. Now I need to make sure my app is not a cluster buster. I feel like I am on the right path. Thanks and Best, GW On 18 December 2016 at 09:53, Dorian Hoxha <dorian.ho...@gmail.com> wrote: > On Sun, Dec 18, 2016 at 3:48 PM, GW <thegeofo...@gmail.com> wrote: > > > Yeah, > > > > > > I'll look at the proxy you suggested shortly. > > > > I've discovered that the idea of making a zookeeper aware app is > pointless > > when scripting REST calls right after I installed libzookeeper. > > > > Zookeeper is there to provide the zookeeping for Solr: End of story. Me > > thinks.... > > > > I believe what really has to happen is: connect to the admin API to get > > status > > > > /solr/admin/collections?action=CLUSTERSTATUS > > > > I think it is more sensible to make a cluster aware app. > > > > <lst name="Merchants"><str name="replicationFactor">1</str><lst > > name="shards"><lst name="shard1"><str > > name="range">80000000-7fffffff</str><str name="state">active</str><lst > > name="replicas"><lst name="core_node1"><str > > name="core">FrogMerchants_shard1_replica1</str><str name="base_url"> > > http://10.128.0.2:8983/solr</str><str > > name="node_name">10.128.0.2:8983_solr</str><str > > name="state">active</str><str > > name="leader">true</str></lst></lst></lst></lst> > > > > I can get an array of nodes that have a state of active. So if I have 7 > > nodes that are state = active, I will have those in an array. Then I can > > use rand() funtion with an array count to select a node/url to post a > json > > string. It would eliminate the need for a load balancer. I think..... > > > If you send to random(node), there is high chance(increasing with number of > nodes/shards) that node won't have the leader, so that node will also > redirect it to the leader. What you can do, is compute the hash of the 'id' > field locally. with hash-id you will get shard-id (because each shard has > the hash-range), and with shard, you will find the leader, and you will > find on which node the leader is (cluster-status) and send the request > directly to the leader and be certain that it won't be redirected again > (less network hops). > > > > //pseudo code > > > > $array_count = $count($active_nodes) > > > > $url_target = rand(0, $array_count); > > > > // creat a function to pull the url somthing like > > > > > > $url = get_solr_url($url_target); > > > > I have test sever on my bench. I'll spin up a 5 node cluster today, get > my > > app cluster aware and then get into some Solr indexes with Vi and totally > > screw with some shards. > > > > If I am correct I will post again. > > > > Best, > > > > GW > > > > On 15 December 2016 at 12:34, Shawn Heisey <apa...@elyograg.org> wrote: > > > > > On 12/14/2016 7:36 AM, GW wrote: > > > > I understand accessing solr directly. I'm doing REST calls to a > single > > > > machine. > > > > > > > > If I have a cluster of five servers and say three Apache servers, I > can > > > > round robin the REST calls to all five in the cluster? > > > > > > > > I guess I'm going to find out. :-) If so I might be better off just > > > > running Apache on all my solr instances. > > > > > > If you're running SolrCloud (which uses zookeeper) then sending > multiple > > > query requests to any node will load balance the requests across all > > > replicas for the collection. This is an inherent feature of SolrCloud. > > > Indexing requests will be forwarded to the correct place. > > > > > > The node you're sending to is a potential single point of failure, > which > > > you can eliminate by putting a load balancer in front of Solr that > > > connects to at least two of the nodes. As I just mentioned, SolrCloud > > > will do further load balancing to all nodes which are capable of > serving > > > the requests. > > > > > > I use haproxy for a load balancer in front of Solr. I'm not running in > > > Cloud mode, but a load balancer would also work for Cloud, and is > > > required for high availability when your client only connects to one > > > server and isn't cloud aware. > > > > > > http://www.haproxy.org/ > > > > > > Solr includes a cloud-aware Java client that talks to zookeeper and > > > always knows the state of the cloud. This eliminates the requirement > > > for a load balancer, but using that client would require that you write > > > your website in Java. > > > > > > The PHP clients are third-party software, and as far as I know, are not > > > cloud-aware. > > > > > > https://wiki.apache.org/solr/IntegratingSolr#PHP > > > > > > Some advantages of using a Solr client over creating HTTP requests > > > yourself: The code is easier to write, and to read. You generally do > > > not need to worry about making sure that your requests are properly > > > escaped for URLs, XML, JSON, etc. The response to the requests is > > > usually translated into data structures appropriate to the language -- > > > your program probably doesn't need to know how to parse XML or JSON. > > > > > > Thanks, > > > Shawn > > > > > > > > >