Re: Setting up to index multiple datastores

Erick Erickson Sun, 05 Mar 2017 09:36:23 -0800

bq:  Is each shard/replica/core in fact a separate instance?

No. I'm defining "instance" here as a JVM running Solr. And be careful
here, a "shard" is made up of one or more "replicas". Those replicas
may or may not be distributed amongst separate JVMs/machines. Each
replica of a given shard has the same documents in it.

A "replica" is a specialized "core". The term "replica" is generally
confined to talking about SolrCloud.

So, in SolrCloud a "collection" is made up of one or more "shards".
Each shard is made up of one or more "replicas".
A replica is a specialized "core".
Each Solr instance can host one or more "cores". I've seen hundreds of
cores hosted by a single JVM.

bq: If I'm running on a single machine - would I then have multiple
"cores" listening on multiple ports?

No. They're each address by a separate URL on the same port, i.e.
http://localhost:8983/solr/core1
http://localhost:8983/solr/core2

etc.

If you have more than one JVM on a single machine, _then_ you address
them by different ports.

bq: If so - I'm thinking there'd be no benefit.

It Depends (tm). There's some loss since each core has some overhead.
There's some gain because certain operations (filterCache comes to
mind) operate over all the docs in a core so having one core has some
memory costs. Not to mention that scoring happens over all the docs in
a core, so the response time may be quicker with multiple cores (yes,
fq clauses help with this, but they have their own overhead).

If you're not using SolrCloud, you can use "Transient Cores" to limit
the number of cores in memory at any given point. Smaller heap
required, better performance characteristics. That presupposes that
your usage pattern is "user signs on, searches for a bit and signs
off", i.e. you're not supporting all users searching simultaneously.

Best,
Erick

On Sun, Mar 5, 2017 at 12:13 AM, Daniel Miller <dmil...@amfes.com> wrote:
> On 3/4/2017 12:00 PM, Shawn Heisey wrote:
>>
>> On 3/3/2017 11:28 PM, Daniel Miller wrote:
>>>
>>> What I think I want is create a single collection, with a
>>> shard/replica/core per user.  Or maybe I'm wanting a separate
>>> collection per user - which would again mean a single
>>> shard/replica/core.  But it seems like each shard/replica/core is a
>>> separate instance.
>>
>> Manual sharding (implicit) is something you can do, but it does mean a
>> LOT of individual cores.  Many shards/replicas can cause just as many
>> performance issues as many collections.
>
>
> Sorry to keep hitting the same point - but I'm still not understanding.  Is
> each shard/replica/core in fact a separate instance?  If I'm running on a
> single machine - would I then have multiple "cores" listening on multiple
> ports?  If so - I'm thinking there'd be no benefit.
>
>>
>>> Without modifying Dovecot source, I can have it generate URL's like,
>>> "http://solr.server.local:8983/solr/dovecot/"; (which is what I do now)
>>> or maybe, "http://solr.server.local:8983/solr/dovecot_user/"; or even
>>> "http://solr.server.local:8983/solr/dovecot/dovecot_user";.  But I'm
>>> not understanding how, if possible, I can have the indexes created
>>> appropriately to support such access.  The only examples I've seen use
>>> either separate ports or ip's for listeners.
>>
>> If you use shards, the shard name would be a URL parameter, not part of
>> the URL path.  Can Dovecot do that?
>
>
> Not without modifying the source - which may indeed be appropriate. What I'm
> still not clear on (actually there's a lot...) is:
>
> Without using multiple servers for redundancy or distributed search - would
> splitting the index offer any performance benefit?  If not, there's probably
> no point in continuing and digging into Dovecot internals.
>
> Daniel
>

Re: Setting up to index multiple datastores

Reply via email to