We have created some scripts that can do this for you - basically reconstruct (by looking at information in ZK) solr.xml, core.properties etc on the new machine as they where on the machine that crashed. Our procedure when a machine crashes is
* Remove it from rack, replace it by a similar machine with same hostname/IP
* Run the scripts pointing out the IP of the machine that needs to have solr.xml and core.properties written * Start solr on this machine - it now run that same set of replica that the crashed machine did. Guess they will sync automatically with their sister-replica, but I do not know, because we do not use replication.

I might be able to find something for you. Which version are you using - I have some scripts that work on 4.0 and some other scripts that work for 4.4 (and maybe later).

Regards, Per Steffensen

On 28/02/14 16:17, Jan Van Besien wrote:
Hi,

I am a bit confused about how solr cloud disaster recovery is supposed
to work exactly in the case of loosing a single node completely.

Say I have a solr cloud cluster with 3 nodes. My collection is created
with numShards=3&replicationFactor=3&maxShardsPerNode=3, so there is
no data loss when I loose a node.

However, how do configure a new node to take the place of the dead
node? I bring up a new node (same hostname, ip, as the dead node)
which is completely empty (empty data dir, empty solr.xml), install
solr, and connect it to zookeeper.

Is it supposed to work automatically from there? In my tests, the
server has no cores and the solr-cloud graph overview simply shows all
the shards/replicas on this node as down. Do I need to recreate the
cores first? Note that these cores were initially created indirectly
by creating the collection.

Thanks,
Jan


Reply via email to