Re: Terminology question: Core vs. Collection vs...

Jack Krupansky Fri, 04 Jan 2013 05:55:28 -0800

Replication makes perfect sense even if our explanations so far do not.


A shard is an abstraction of a subset of the data for a collection.

A replica is an instance of the data of the shard and instances of Solrservers that have indicated a readiness to service queries and updates forthe data. Alternatively, a replica is a node which has indicated a readinessto receive and serve the data of a shard, but may not have any data at themoment.

Lets describe it operationally for SolrCloud: If data comes in to anyreplica of a shard it will automatically and quickly be "replicated" to allother replicas of the shard. If a new replica of a shard comes up it will bestreamed all of the data from the another replica of the shard. If anexisting replica of a shard restarts or reconnects to the cluster, it willbe streamed updates of any new data since it was last updated from anotherreplica of the shard.

Replication is simply the process of assuring that all replicas are kept upto date. That's the same abstract meaning as for Master/Slave even thoughthe operational details are somewhat different. The goal remains the same.

Replication factor is the number of instances of the data of the shard andinstances of Solr servers that can service queries and updates for the data.Alternatively, the replication factor is the number of nodes of theSolrCloud cluster which have indicated a readiness to receive and serve thedata of a shard, but may not have any data at the moment.

A node is an instance of Solr running in a Java JVM that has indicated tothe Zookeeper ensemble of a SolrCloud cluster that it is ready to be areplica for a shard of a collection. [The latter part of that is a bit toofuzzy - I'm not sure what the node tells Zookeeper and who does shardassignment. I mean, does a node explicitly say what shard it wants to be, oris that assigned by Zookeeper, or is that a node's choice/option? But noneof that changes the fact that a node "registers" with Zookeeper and thensomehow becomes a replica for a shard.]

A node (instance of a Solr server) can be a replica of shards from multiplecollections (potentially multiple shards per collection). A node is not areplica per se, but a container that can serve multiple collections. A nodecan serve as multiple replicas, each of a different collection.

My only interest here on this user list is to understand and explain theterms we have today and that SEEM to be working for the most part, eventhough we may not have defined them carefully enough and used themconsistently enough.

If somebody want to propose an alternative terminology - fine, discuss thaton the dev list and/or file a Jira.

I won't claim that my definitions are perfect (yet), but perfecting thedefinitions (for users) should be separated from changing the termsthemselves.


-- Jack Krupansky

-----Original Message-----From: Per Steffensen

Sent: Friday, January 04, 2013 2:49 AM
To: solr-user@lucene.apache.org
Subject: Re: Terminology question: Core vs. Collection vs...

On 1/3/13 5:58 PM, Walter Underwood wrote:

A "factor" is multiplied, so multiplying the leader by a replicationFactorof 1 means you have exactly one copy of that shard.
I think that recycling the term "replication" within Solr was confusing,but it is a bit late to change that.
wunder

Yes, the term "factor" is not misleading, but the term "replication" is.
If we keep calling shard-instances for "Replica" I guess "replicaFactor"
will be ok - at least much better than "replicationFactor". But it would

still be better with e.g. "ShardInstance" and "InstancesPerShard"

Re: Terminology question: Core vs. Collection vs...

Reply via email to