Ah... the multiple shards (of the same collection) in a single node is
about planning for future expansion of your cluster - create more
shards than you need today, put more of them on a single node and then
migrate them to their own nodes as the data outgrows the smaller
number of nodes. In other words, add nodes incrementally without
having to reindex all the data.
-- Jack Krupansky
-----Original Message----- From: Darren Govoni
Sent: Thursday, January 03, 2013 9:18 AM
To: solr-user@lucene.apache.org
Subject: RE: Re: Terminology question: Core vs. Collection vs...
Yes. And its worth to note that when having multiple shards in a
single node(@deprecated) that they are shards of different collections...
<br><br><br>------- Original Message -------
On 1/3/2013 09:16 AM Jack Krupansky wrote:<br>And I would revise
"node" to note that in SolrCloud a node is simply an
<br>instance of a Solr server.
<br>
<br>And, technically, you can have multiple shards in a single
instance of Solr,
<br>separating the logical sharding of keys from the distribution of
the data.
<br>
<br>-- Jack Krupansky
<br>
<br>-----Original Message----- <br>From: Jack Krupansky
<br>Sent: Thursday, January 03, 2013 9:08 AM
<br>To: solr-user@lucene.apache.org
<br>Subject: Re: Terminology question: Core vs. Collection vs...
<br>
<br>Oops... let me word that a little more carefully:
<br>
<br>...we are "replicating the data of each shard".
<br>
<br>
<br>
<br>
<br>
<br>-- Jack Krupansky
<br>-----Original Message----- <br>From: Jack Krupansky
<br>Sent: Thursday, January 03, 2013 9:03 AM
<br>To: solr-user@lucene.apache.org
<br>Subject: Re: Terminology question: Core vs. Collection vs...
<br>
<br>No, a shard is a subset (or "slice") of the collection. Sharding
is a way of
<br>"slicing" the original data, before we talk about how the shards
get stored
<br>and replicated on actual Solr cores. Replicas are instances of the
data for
<br>a shard.
<br>
<br>Sometimes people may loosely speak of a replica as being "a
shard", but
<br>that's just loose use of the terminology.
<br>
<br>So, we're not "sharding shards", but we are "replicating shards".
<br>
<br>-- Jack Krupansky
<br>
<br>-----Original Message----- <br>From: Darren Govoni
<br>Sent: Thursday, January 03, 2013 8:51 AM
<br>To: solr-user@lucene.apache.org
<br>Subject: RE: Re: Terminology question: Core vs. Collection vs...
<br>
<br>Thanks again. (And sorry to jump into this convo)
<br>
<br>But I had a question on your statement:
<br>
<br>On 1/3/2013 08:07 AM Jack Krupansky wrote:
<br> <br>Collection is the more modern term and incorporates the
fact that the
<br><br>collection may be sharded, with each shard on one or more
cores, with
<br>each <br>core being a replica of the other cores within that shard
of that
<br><br>collection.
<br>
<br>A collection is sharded, meaning it is distributed across cores. A
shard
<br>itself is not distributed across cores in the same since. Rather a
shard
<br>exist on a single core and is replicated on other cores. Is that
right? The
<br>way its worded above, it sounds like a shard can also be sharded...
<br>
<br>
<br><br><br><br>------- Original Message -------
<br>On 1/3/2013 08:28 AM Jack Krupansky wrote:<br>A node is a machine
in a
<br>cluster or cloud (graph). It could be a real
<br><br>machine or a virtualized machine. Technically, you could have
multiple
<br><br>virtual nodes on the same physical "box". Each Solr replica
would be on
<br>a
<br><br>different node.
<br><br>
<br><br>Technically, you could have multiple Solr instances running on
a single
<br><br>hardware node, each with a different port. They are simply
instances of
<br><br>Solr, although you could consider each Solr instance a node in
a Solr
<br>cloud
<br><br>as well, a "virtual" node. So, technically, you could have
multiple
<br>replicas
<br><br>on the same node, but that sort of defeats most of the purpose
of having
<br><br>replicas in the first place - to distribute the data for
performance and
<br><br>fault tolerance. But, you could have replicas of different
shards on the
<br><br>same node/box for a partial improvement of performance and fault
<br>tolerance.
<br><br>
<br><br>A Solr "cloud' is really a cluster.
<br><br>
<br><br>-- Jack Krupansky
<br><br>
<br><br>-----Original Message----- <br><br>From: Darren Govoni
<br><br>Sent: Thursday, January 03, 2013 8:16 AM
<br><br>To: solr-user@lucene.apache.org
<br><br>Subject: RE: Re: Terminology question: Core vs. Collection vs...
<br><br>
<br><br>Good write up.
<br><br>
<br><br>And what about "node"?
<br><br>
<br><br>I think there needs to be an official glossary of terms that is
<br>sanctioned
<br><br>by the solr team and some terms still ni use may need to be
labeled
<br><br>"deprecated". After so many years, its still confusing.
<br><br>
<br><br><br><br><br>------- Original Message -------
<br><br>On 1/3/2013 08:07 AM Jack Krupansky wrote:<br>Collection is
the more
<br>modern
<br><br>term and incorporates the fact that the
<br><br><br>collection may be sharded, with each shard on one or more
cores,
<br>with
<br><br>each
<br><br><br>core being a replica of the other cores within that shard
of that
<br><br><br>collection.
<br><br><br>
<br><br><br>Instance is a general term, but is commonly used to refer
to a
<br>running
<br><br>Solr
<br><br><br>server, each of which can service any number of cores. A
sharded
<br><br>collection
<br><br><br>would typically require multiple instances of Solr, each
with a
<br>shard of
<br><br>the
<br><br><br>collection.
<br><br><br>
<br><br><br>Multiple collections can be supported on a single instance
of Solr.
<br>They
<br><br><br>don't have to be sharded or replicated. But if they are,
each Solr
<br><br>instance
<br><br><br>will have a copy or replica of the data (index) of one
shard of each
<br><br>sharded
<br><br><br>collection - to the degree that each collection needs that
many
<br>shards.
<br><br><br>
<br><br><br>At the API level, you talk to a Solr instance, using a
host and
<br>port,
<br><br>and
<br><br><br>giving the collection name. Some operations will refer
only to the
<br><br>portion
<br><br><br>of a multi-shard collection on that Solr instance, but
typically
<br>Solr
<br><br>will
<br><br><br>"distribute" the operation, whether it be an update or a
query, to
<br>all
<br><br>of
<br><br><br>the shards of the named collection. In the case of update,
the
<br>update
<br><br>will
<br><br><br>be distributed to all replicas as well, but in the case of
query
<br>only
<br><br>one
<br><br><br>replica of each shard of the collection is needed.
<br><br><br>
<br><br><br>Before SolrCloud we Solr had master and slave and the
slaves were
<br><br>replicas
<br><br><br>of the master, but with SolrCloud there is no master and
all the
<br><br>replicas of
<br><br><br>the shard are peers, although at any moment of time one of
them will
<br>be
<br><br><br>considered the "leader" for coordination purposes, but not
in the
<br>sense
<br><br>that
<br><br><br>it is a master of the other replicas in that shard. A
SolrCloud
<br>replica
<br><br>is a
<br><br><br>replica of the data, in an abstract sense, for a single
shard of a
<br><br><br>collection. A SolrCloud replica is more of an instance of the
<br><br>data/index.
<br><br><br>
<br><br><br>An index exists at two levels: the portion of a collection
on a
<br>single
<br><br>Solr
<br><br><br>core will have a Lucene index, but collectively the Lucene
indexes
<br>for
<br><br>the
<br><br><br>shards of a collection can be referred to the index of the
<br>collection.
<br><br>Each
<br><br><br>replica is a copy or instance of a portion of the
collection's
<br>index.
<br><br><br>
<br><br><br>The term slice is sometimes used to refer collectively to
all of the
<br><br><br>cores/replicas of a single shard, or sometimes to a single
replica
<br>as it
<br><br><br>contains only a "slice" of the full collection data.
<br><br><br>
<br><br><br>-- Jack Krupansky
<br><br><br>
<br><br><br>-----Original Message----- <br><br><br>From: Alexandre
Rafalovitch
<br><br><br>Sent: Thursday, January 03, 2013 4:42 AM
<br><br><br>To: solr-user@lucene.apache.org
<br><br><br>Subject: Terminology question: Core vs. Collection vs...
<br><br><br>
<br><br><br>Hello,
<br><br><br>
<br><br><br>I am trying to understand the core Solr terminology. I am
looking
<br>for
<br><br><br>correct rather than loose meaning as I am trying to teach
an example
<br><br>that
<br><br><br>starts from easy scenario and may scale to multi-core,
multi-machine
<br><br><br>situation.
<br><br><br>
<br><br><br>Here are the terms that seem to be all overlapping and/or
crossing
<br>over
<br><br>in
<br><br><br>my mind a the moment.
<br><br><br>
<br><br><br>1) Index
<br><br><br>2) Core
<br><br><br>3) Collection
<br><br><br>4) Instance
<br><br><br>5) Replica (Replica of _what_?)
<br><br><br>6) Others?
<br><br><br>
<br><br><br>I tried looking through documentation, but either there is a
<br>terminology
<br><br><br>drift or I am having trouble understanding the distinctions.
<br><br><br>
<br><br><br>If anybody has a clear picture in their mind, I would
appreciate a
<br><br><br>clarification.
<br><br><br>
<br><br><br>Regards,
<br><br><br> Alex.
<br><br><br>
<br><br><br>Personal blog: http://blog.outerthoughts.com/
<br><br><br>LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
<br><br><br>- Time is the quality of nature that keeps events from
happening all
<br>at
<br><br><br>once. Lately, it doesn't seem to be working. (Anonymous
- via GTD
<br><br>book)
<br><br><br>
<br><br><br>
<br><br>
<br><br>
<br>
<br>