Wow, thanks.

So assuming I have a five node ensemble and one machine is rolling along as
leader, am I correct to assume that as a leader becomes taxed it can lose
the election and another takes over as leader? The leader actually floats
about the ensemble under load? I was thinking the leader was merely for
referential integrity and things stayed that way until a physical failure.

This would all seem important when building indexes.

I think I need to set up a sniffer.

Identifying the node with a hash id seems very cool. If my app makes the
call to the server with the appropriate shard, then there might only be
messaging on the Zookeeper network. Is this a correct assumption?

Is my terminology cross threaded?

Oh well, time to build my first cluster. I wrote all my clients with single
shard collections on a stand alone. Now I need to make sure my app is not a
cluster buster.

I feel like I am on the right path.

Thanks and Best,

GW

















On 18 December 2016 at 09:53, Dorian Hoxha <dorian.ho...@gmail.com> wrote:

> On Sun, Dec 18, 2016 at 3:48 PM, GW <thegeofo...@gmail.com> wrote:
>
> > Yeah,
> >
> >
> > I'll look at the proxy you suggested shortly.
> >
> > I've discovered that the idea of making a zookeeper aware app is
> pointless
> > when scripting REST calls right after I installed libzookeeper.
> >
> > Zookeeper is there to provide the zookeeping for Solr: End of story. Me
> > thinks....
> >
> > I believe what really has to happen is: connect to the admin API to get
> > status
> >
> > /solr/admin/collections?action=CLUSTERSTATUS
> >
> > I think it is more sensible to make a cluster aware app.
> >
> > <lst name="Merchants"><str name="replicationFactor">1</str><lst
> > name="shards"><lst name="shard1"><str
> > name="range">80000000-7fffffff</str><str name="state">active</str><lst
> > name="replicas"><lst name="core_node1"><str
> > name="core">FrogMerchants_shard1_replica1</str><str name="base_url">
> > http://10.128.0.2:8983/solr</str><str
> > name="node_name">10.128.0.2:8983_solr</str><str
> > name="state">active</str><str
> > name="leader">true</str></lst></lst></lst></lst>
> >
> > I can get an array of nodes that have a state of active. So if I have 7
> > nodes that are state = active, I will have those in an array. Then I can
> > use rand() funtion with an array count to select a node/url to post a
> json
> > string. It would eliminate the need for a load balancer. I think.....
> >
> If you send to random(node), there is high chance(increasing with number of
> nodes/shards) that node won't have the leader, so that node will also
> redirect it to the leader. What you can do, is compute the hash of the 'id'
> field locally. with hash-id you will get shard-id (because each shard has
> the hash-range), and with shard, you will find the leader, and you will
> find on which node the leader is (cluster-status) and send the request
> directly to the leader and be certain that it won't be redirected again
> (less network hops).
>
>
> > //pseudo code
> >
> > $array_count = $count($active_nodes)
> >
> > $url_target = rand(0, $array_count);
> >
> > // creat a function to pull the url   somthing like
> >
> >
> > $url = get_solr_url($url_target);
> >
> > I have test sever on my bench. I'll spin up a 5 node cluster today, get
> my
> > app cluster aware and then get into some Solr indexes with Vi and totally
> > screw with some shards.
> >
> > If I am correct I will post again.
> >
> > Best,
> >
> > GW
> >
> > On 15 December 2016 at 12:34, Shawn Heisey <apa...@elyograg.org> wrote:
> >
> > > On 12/14/2016 7:36 AM, GW wrote:
> > > > I understand accessing solr directly. I'm doing REST calls to a
> single
> > > > machine.
> > > >
> > > > If I have a cluster of five servers and say three Apache servers, I
> can
> > > > round robin the REST calls to all five in the cluster?
> > > >
> > > > I guess I'm going to find out. :-)  If so I might be better off just
> > > > running Apache on all my solr instances.
> > >
> > > If you're running SolrCloud (which uses zookeeper) then sending
> multiple
> > > query requests to any node will load balance the requests across all
> > > replicas for the collection.  This is an inherent feature of SolrCloud.
> > > Indexing requests will be forwarded to the correct place.
> > >
> > > The node you're sending to is a potential single point of failure,
> which
> > > you can eliminate by putting a load balancer in front of Solr that
> > > connects to at least two of the nodes.  As I just mentioned, SolrCloud
> > > will do further load balancing to all nodes which are capable of
> serving
> > > the requests.
> > >
> > > I use haproxy for a load balancer in front of Solr.  I'm not running in
> > > Cloud mode, but a load balancer would also work for Cloud, and is
> > > required for high availability when your client only connects to one
> > > server and isn't cloud aware.
> > >
> > > http://www.haproxy.org/
> > >
> > > Solr includes a cloud-aware Java client that talks to zookeeper and
> > > always knows the state of the cloud.  This eliminates the requirement
> > > for a load balancer, but using that client would require that you write
> > > your website in Java.
> > >
> > > The PHP clients are third-party software, and as far as I know, are not
> > > cloud-aware.
> > >
> > > https://wiki.apache.org/solr/IntegratingSolr#PHP
> > >
> > > Some advantages of using a Solr client over creating HTTP requests
> > > yourself:  The code is easier to write, and to read.  You generally do
> > > not need to worry about making sure that your requests are properly
> > > escaped for URLs, XML, JSON, etc.  The response to the requests is
> > > usually translated into data structures appropriate to the language --
> > > your program probably doesn't need to know how to parse XML or JSON.
> > >
> > > Thanks,
> > > Shawn
> > >
> > >
> >
>

Reply via email to