We have created some scripts that can do this for you - basically
reconstruct (by looking at information in ZK) solr.xml, core.properties
etc on the new machine as they where on the machine that crashed. Our
procedure when a machine crashes is
* Remove it from rack, replace it by a similar machine with same hostname/IP
* Run the scripts pointing out the IP of the machine that needs to have
solr.xml and core.properties written
* Start solr on this machine - it now run that same set of replica that
the crashed machine did. Guess they will sync automatically with their
sister-replica, but I do not know, because we do not use replication.
I might be able to find something for you. Which version are you using -
I have some scripts that work on 4.0 and some other scripts that work
for 4.4 (and maybe later).
Regards, Per Steffensen
On 28/02/14 16:17, Jan Van Besien wrote:
Hi,
I am a bit confused about how solr cloud disaster recovery is supposed
to work exactly in the case of loosing a single node completely.
Say I have a solr cloud cluster with 3 nodes. My collection is created
with numShards=3&replicationFactor=3&maxShardsPerNode=3, so there is
no data loss when I loose a node.
However, how do configure a new node to take the place of the dead
node? I bring up a new node (same hostname, ip, as the dead node)
which is completely empty (empty data dir, empty solr.xml), install
solr, and connect it to zookeeper.
Is it supposed to work automatically from there? In my tests, the
server has no cores and the solr-cloud graph overview simply shows all
the shards/replicas on this node as down. Do I need to recreate the
cores first? Note that these cores were initially created indirectly
by creating the collection.
Thanks,
Jan