On 12/11/2015 8:19 AM, Gian Maria Ricci - aka Alkampfer wrote:
> Thanks for all of your clarification. I know that solrcloud is a really
> better configuration than any other, but actually it has a complexity that
> is really higher. I just want to give you the pain point I've noticed while
> I was gathering all the info I can got on SolrCloud.
> 
> 1) zookeeper documentation says that to have the best experience you should
> have a dedicated filesystem for the persistence and it should never swap to
> disk. I've not found any guidelines on how I should dimension zookeeper
> machine, how much ram, disk? Can I install zookeeper in the same machines
> where Solr resides ( I suspect no, because Solr machine are under stress and
> if zookeeper start swapping is can lead to problem)?

Standalone zookeeper doesn't require much in the way of resources.
Unless the SolrCloud installation is enormous, a machine with 1-2GB of
RAM is probably plenty, if the only thing it is doing is zookeeper and
it's not running Windows.  If the SolrCloud install has a lot of
collections, shards, and/or servers, then you might need more, because
the zookeeper database will be larger.

> 2) What about the update? If I need to update my solrcloud instance and the
> new version requires a new version of zookeeper which is the path to go? I
> need to first update zookeeper, or upgrading solr to existing machine or?
> Maybe I did not search well but I did not find a comprehensive guideline
> that told me how to upgrade my SolrCloud installation in various situation. 

If you're following recommendations and using standalone zookeeper, then
upgrading it is entirely separate from upgrading Solr.  It's probably a
good idea to upgrade your three (or more) zookeeper servers first.

Here's a FAQ entry from zookeeper about upgrades:

https://wiki.apache.org/hadoop/ZooKeeper/FAQ#A6

> 3) Which are the best practices to run DIH in solrcloud? I think I can round
> robin triggering DIH import on different server composing the cloud
> infrastructure, or there is a better way to go? (I probably need to trigger
> a DIH each 5/10 minutes but the number of new records is really small)

When checking the status of an import, you must send the status request
to the same machine where you sent the command to start the import.

If you're only ever going to run one DIH at a time, then I don't see any
reason to involve multiple servers.  If you want to run more than one
simultaneously, then you might want to run each one on a different machine.

> 4) Since I believe that it is not best practice to install zookeeper on same
> SolrMachine (as separated process, not the built in zookeeper), I need at
> least three more machine to maintain / monitor / upgrade and I need also to
> monitor zookeeper, a new appliance that need to be mastered by IT
> Infrastructure.

The only real reason to avoid zookeeper and Solr on the same machine is
performance under high load, and mostly that comes down to I/O
performance, so if you can put zookeeper on a separate set of disks,
you're probably good.  If the query/update load will not be high, then
sharing machines will likely work well, even if the disks are all shared.

> Is there any guidelines on how to automate promoting a slave as a master in
> classic Master Slave situation? I did not find anything official, because
> auto promoting a slave into master could solve my problem.

I don't know of any explicit information explaining how to promote a new
master.  Basically what you have to do is reconfigure the new master's
replication (so it stops trying to be a slave), reconfigure every slave
to point to the new master, and reconfigure every client that makes
index updates.  DNS changes *might* be able to automate the slave and
update client reconfig, but the master reconfig requires changing Solr's
configuration, which at the very least will require reloading or
restarting that server.  That could be automated, but it's up to you to
write the automation.

Thanks,
Shawn

Reply via email to