Re: Deciding how to correctly use Solr multicore

Jack Krupansky Sun, 09 Feb 2014 06:05:18 -0800

The first question I always is ask is how do you want to query the data -what is the full range of query use cases?

For example, might a customer every want to query across all of theirprojects?

You didn't say how many customers you must be able to support. This leads toquestions about how many customers or projects run on a single Solr server.It sounds like you may require quite a number of Solr servers, eachmulti-core. And in some cases a single customer might not fit on a singleSolr server. SolrCloud might begin to make sense even though it sounds likea single collection would rarely need to be sharded.

You didn't speak at all about HA (High Availability) requirements orreplication.

Or about query latency requirements or query load - which can impactreplication requirements.


-- Jack Krupansky

-----Original Message-----From: Pisarev, Vitaliy

Sent: Sunday, February 9, 2014 4:22 AM
To: solr-user@lucene.apache.org
Subject: Deciding how to correctly use Solr multicore

Hello!

We are evaluating Solr usage in our organization and have come to the pointwhere we are past the functional tests and are now looking in choosing thebest deployment topology.Here are some details about the structure of the problem: The applicationdeals with storing and retrieving artifacts of various types. The artifactare stored in Projects. Each project can have hundreds of thousands ofartifacts (total on all types) and our largest customers have hundreds ofprojects (~300-800) though the vast majority have tens of project (~30-100).


Core granularity

In terms of Core granularity- it seems to me that a core per project issensible, as pushing everything to a single core will probably be too much.The entities themselves will have a special type field for distinction.Moreover, it may be that not all of the project are active in a given timeso this allows their indexes to remain on latent on disk.



Availability and synchronization

Our application is deployed on premise on our customers sites- we cannot gotoo crazy as to the amount of extra resources we demand from them- e.g.dedicated indexing servers. We pretty much need to make do with what isalready there.

For now, we are planning to use the DIH to maintain the index. Each node thecluster on the app will have its own local index. When a project is created(or the feature is enabled on an existing project), a core is created for iton each one of the nodes, a full import is executed and then a delta importis scheduled to run on each one of the nodes. This gives us simplicity but Iam wondering about the performance and memory consumption costs? Also, I amwondering whether we should use replication for this purpose. Therequirement is for the index to be updated once in 30 seconds - are deltaimports design for this?

I understand that this is a very complex problem in general. I tried tohighlight all the most significant aspects and will appreciate some initialguidance. Note that we are planning to execute performance and stresstesting no matter what but the assumption is that the topology of thesolution can be predetermined with the existing data.

Re: Deciding how to correctly use Solr multicore

Reply via email to