On 25/03/15 15:03, Ian Rose wrote:
Per - Wow, 1 trillion documents stored is pretty impressive. One
clarification: when you say that you have 2 replica per collection on each
machine, what exactly does that mean? Do you mean that each collection is
sharded into 50 shards, divided evenly over all 25 machines (thus 2 shards
per machine)?
Yes
Or are some of these slave replicas (e.g. 25x sharding with
1 replica per shard)?
No replication. It does not work very well, at least in 4.4.0. Besides
that I am not a big fan of two (or more) machines having to do all the
indexing work and making sure to keep synchronized. Use a distributed
file-system supporting multiple copies of every piece of data (like
HDFS) for HA on data-level. Have only one Solr-node handle the indexing
into a particular shard - if this Solr-node breaks down let another
Solr-node take over the indexing "leadership" on this shard. Besides the
indexing Solr-node several other Solr-nodes can serve data from this
shard - just watching the data-folder (can commits) done by the
indexing-leader of this particular shard - will give you HA on
service-level. That is probably how we are going to do HA - pretty soon.
But that is another story
Thanks!
No problem