Yes. Thats it. Its clear if we separate logical terms from physical terms. A 
simple cake diagram on the wiki along with perhaps a uml will solidify these 
concepts.


Sent from my Verizon Wireless 4G LTE Smartphone

-------- Original message --------
From: Jack Krupansky <j...@basetechnology.com> 
Date:  
To: solr-user@lucene.apache.org,darren <dar...@ontrenet.com> 
Subject: Re: Terminology question: Core vs. Collection vs... 
 
I thought about adding Solr core, but it only muddies the water. Yes, it 
needs to be added, but carefully.

In the context of SolrCloud, a Solr core is the underlying representation of 
a replica. Alternatively, a replica of a shard of a collection is 
implemented as a Solr core. [Need to factor in the potential for multiple 
shards on a single node.] Or, a Solr core is capable of serving as a replica 
of a shard. A Solr core has a collection name but can exist without being 
registered with Zookeeper, so it may not be a replica of a 
zookeeper-registered collection.

Something like that. Not quite there yet.

The main point, I think, is that when we talk about SolrCloud or a Solr 
cluster it would be better for people to speak of replicas and shards and 
collections than cores since core is the implementation rather than the 
abstraction. I mean, at the level of cores, they know of only documents and 
fields, not shards, replicas, and the overall structure of collections and 
the cluster. Sure, the core has the name of the collection, but cores on 
other nodes can use that same name.

-- Jack Krupansky

-----Original Message----- 
From: darren
Sent: Friday, January 04, 2013 9:00 AM
To: j...@basetechnology.com ; solr-user@lucene.apache.org
Subject: Re: Terminology question: Core vs. Collection vs...

This is a good explanation and makes sense. The one inconsistency is 
referring to a replica of a shard that has no replication. But its not that 
big of a problem. If you wove the term 'core' into your writeup below it 
would be complete and should be posted on the wiki.



Sent from my Verizon Wireless 4G LTE Smartphone

-------- Original message --------
From: Jack Krupansky <j...@basetechnology.com>
Date:
To: solr-user@lucene.apache.org
Subject: Re: Terminology question: Core vs. Collection vs...

Replication makes perfect sense even if our explanations so far do not.

A shard is an abstraction of a subset of the data for a collection.

A replica is an instance of the data of the shard and instances of Solr
servers that have indicated a readiness to service queries and updates for
the data. Alternatively, a replica is a node which has indicated a readiness
to receive and serve the data of a shard, but may not have any data at the
moment.

Lets describe it operationally for SolrCloud: If data comes in to any
replica of a shard it will automatically and quickly be "replicated" to all
other replicas of the shard. If a new replica of a shard comes up it will be
streamed all of the data from the another replica of the shard. If an
existing replica of a shard restarts or reconnects to the cluster, it will
be streamed updates of any new data since it was last updated from another
replica of the shard.

Replication is simply the process of assuring that all replicas are kept up
to date. That's the same abstract meaning as for Master/Slave even though
the operational details are somewhat different. The goal remains the same.

Replication factor is the number of instances of the data of the shard and
instances of Solr servers that can service queries and updates for the data.
Alternatively, the replication factor is the number of nodes of the
SolrCloud cluster  which have indicated a readiness to receive and serve the
data of a shard, but may not have any data at the moment.

A node is an instance of Solr running in a Java JVM that has indicated to
the Zookeeper ensemble of a SolrCloud cluster that it is ready to be a
replica for a shard of a collection. [The latter part of that is a bit too
fuzzy - I'm not sure what the node tells Zookeeper and who does shard
assignment. I mean, does a node explicitly say what shard it wants to be, or
is that assigned by Zookeeper, or is that a node's choice/option? But none
of that changes the fact that a node "registers" with Zookeeper and then
somehow becomes a replica for a shard.]

A node (instance of a Solr server) can be a replica of shards from multiple
collections (potentially multiple shards per collection). A node is not a
replica per se, but a container that can serve multiple collections. A node
can serve as multiple replicas, each of a different collection.

My only interest here on this user list is to understand and explain the
terms we have today and that SEEM to be working for the most part, even
though we may not have defined them carefully enough and used them
consistently enough.

If somebody want to propose an alternative terminology - fine, discuss that
on the dev list and/or file a Jira.

I won't claim that my definitions are perfect (yet), but perfecting the
definitions (for users) should be separated from changing the terms
themselves.

-- Jack Krupansky

-----Original Message----- 
From: Per Steffensen
Sent: Friday, January 04, 2013 2:49 AM
To: solr-user@lucene.apache.org
Subject: Re: Terminology question: Core vs. Collection vs...

On 1/3/13 5:58 PM, Walter Underwood wrote:
> A "factor" is multiplied, so multiplying the leader by a replicationFactor
> of 1 means you have exactly one copy of that shard.
>
> I think that recycling the term "replication" within Solr was confusing,
> but it is a bit late to change that.
>
> wunder
Yes, the term "factor" is not misleading, but the term "replication" is.
If we keep calling shard-instances for "Replica" I guess "replicaFactor"
will be ok - at least much better than "replicationFactor". But it would
still be better with e.g. "ShardInstance" and "InstancesPerShard"

Reply via email to