Thanks again. (And sorry to jump into this convo)

But I had a question on your statement:

On 1/3/2013 08:07 AM Jack Krupansky wrote:
  <br>Collection is the more modern term and incorporates the fact that the 
<br>collection may be sharded, with each shard on one or more cores, with each 
<br>core being a replica of the other cores within that shard of that
<br>collection.
A collection is sharded, meaning it is distributed across cores. A shard itself 
is not distributed across cores in the same since. Rather a shard exist on a 
single core and is replicated on other cores. Is that right? The way its worded 
above, it sounds like a shard can also be sharded...


<br><br><br>------- Original Message -------
On 1/3/2013 08:28 AM Jack Krupansky wrote:<br>A node is a machine in a cluster or cloud (graph). It could be a real <br>machine or a virtualized machine. Technically, you could have multiple <br>virtual nodes on the same physical "box". Each Solr replica would be on a <br>different node.
<br>
<br>Technically, you could have multiple Solr instances running on a single <br>hardware node, each with a different port. They are simply instances of <br>Solr, although you could consider each Solr instance a node in a Solr cloud <br>as well, a "virtual" node. So, technically, you could have multiple replicas <br>on the same node, but that sort of defeats most of the purpose of having <br>replicas in the first place - to distribute the data for performance and <br>fault tolerance. But, you could have replicas of different shards on the <br>same node/box for a partial improvement of performance and fault tolerance.
<br>
<br>A Solr "cloud' is really a cluster.
<br>
<br>-- Jack Krupansky
<br>
<br>-----Original Message----- <br>From: Darren Govoni
<br>Sent: Thursday, January 03, 2013 8:16 AM
<br>To: solr-user@lucene.apache.org
<br>Subject: RE: Re: Terminology question: Core vs. Collection vs...
<br>
<br>Good write up.
<br>
<br>And what about "node"?
<br>
<br>I think there needs to be an official glossary of terms that is sanctioned <br>by the solr team and some terms still ni use may need to be labeled <br>"deprecated". After so many years, its still confusing.
<br>
<br><br><br><br>------- Original Message -------
<br>On 1/3/2013 08:07 AM Jack Krupansky wrote:<br>Collection is the more modern <br>term and incorporates the fact that the <br><br>collection may be sharded, with each shard on one or more cores, with <br>each
<br><br>core being a replica of the other cores within that shard of that
<br><br>collection.
<br><br>
<br><br>Instance is a general term, but is commonly used to refer to a running <br>Solr <br><br>server, each of which can service any number of cores. A sharded <br>collection <br><br>would typically require multiple instances of Solr, each with a shard of <br>the
<br><br>collection.
<br><br>
<br><br>Multiple collections can be supported on a single instance of Solr. They
<br><br>don't have to be sharded or replicated. But if they are, each Solr <br>instance <br><br>will have a copy or replica of the data (index) of one shard of each <br>sharded
<br><br>collection - to the degree that each collection needs that many shards.
<br><br>
<br><br>At the API level, you talk to a Solr instance, using a host and port, <br>and <br><br>giving the collection name. Some operations will refer only to the <br>portion <br><br>of a multi-shard collection on that Solr instance, but typically Solr <br>will <br><br>"distribute" the operation, whether it be an update or a query, to all <br>of <br><br>the shards of the named collection. In the case of update, the update <br>will <br><br>be distributed to all replicas as well, but in the case of query only <br>one
<br><br>replica of each shard of the collection is needed.
<br><br>
<br><br>Before SolrCloud we Solr had master and slave and the slaves were <br>replicas <br><br>of the master, but with SolrCloud there is no master and all the <br>replicas of
<br><br>the shard are peers, although at any moment of time one of them will be
<br><br>considered the "leader" for coordination purposes, but not in the sense <br>that <br><br>it is a master of the other replicas in that shard. A SolrCloud replica <br>is a
<br><br>replica of the data, in an abstract sense, for a single shard of a
<br><br>collection. A SolrCloud replica is more of an instance of the <br>data/index.
<br><br>
<br><br>An index exists at two levels: the portion of a collection on a single <br>Solr <br><br>core will have a Lucene index, but collectively the Lucene indexes for <br>the <br><br>shards of a collection can be referred to the index of the collection. <br>Each
<br><br>replica is a copy or instance of a portion of the collection's index.
<br><br>
<br><br>The term slice is sometimes used to refer collectively to all of the
<br><br>cores/replicas of a single shard, or sometimes to a single replica as it
<br><br>contains only a "slice" of the full collection data.
<br><br>
<br><br>-- Jack Krupansky
<br><br>
<br><br>-----Original Message----- <br><br>From: Alexandre Rafalovitch
<br><br>Sent: Thursday, January 03, 2013 4:42 AM
<br><br>To: solr-user@lucene.apache.org
<br><br>Subject: Terminology question: Core vs. Collection vs...
<br><br>
<br><br>Hello,
<br><br>
<br><br>I am trying to understand the core Solr terminology. I am looking for
<br><br>correct rather than loose meaning as I am trying to teach an example <br>that
<br><br>starts from easy scenario and may scale to multi-core, multi-machine
<br><br>situation.
<br><br>
<br><br>Here are the terms that seem to be all overlapping and/or crossing over <br>in
<br><br>my mind a the moment.
<br><br>
<br><br>1) Index
<br><br>2) Core
<br><br>3) Collection
<br><br>4) Instance
<br><br>5) Replica (Replica of _what_?)
<br><br>6) Others?
<br><br>
<br><br>I tried looking through documentation, but either there is a terminology
<br><br>drift or I am having trouble understanding the distinctions.
<br><br>
<br><br>If anybody has a clear picture in their mind, I would appreciate a
<br><br>clarification.
<br><br>
<br><br>Regards,
<br><br>   Alex.
<br><br>
<br><br>Personal blog: http://blog.outerthoughts.com/
<br><br>LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
<br><br>- Time is the quality of nature that keeps events from happening all at
<br><br>once. Lately, it doesn't seem to be working. (Anonymous - via GTD <br>book)
<br><br>
<br><br> <br>
<br>

Reply via email to