Re: 20180917-Need Apache SOLR support

Walter Underwood Mon, 17 Sep 2018 08:39:50 -0700

Do not use Solr as a database. It was never designed to be a database.
It is missing a lot of features that are normal in databases.


* no transactions
* no rollback (in Solr Cloud)
* no session isolation (one client’s commit will commit all data in progress)
* no schema migration
* no version migration
* no real backups (Solr backup is a cold server, not a dump/load)
* no dump/load
* modify record (atomic updates are a subset of this)

Solr assumes you can always reload all the data from a repository. This is done
instead of migration or backups.

If you use Solr as a database and lose all your data, don’t blame us. It was
never designed to do that.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Sep 17, 2018, at 7:01 AM, Jan Høydahl <jan....@cominvent.com> wrote:
> 
>> We are beginners to Apache SOLR, We need following clarifications from you.
>> 
>> 
>> 
>> 1.      In SOLRCloud, How can we install more than one Shared on Single PC? 
> 
> You typically have one installation of Solr on each server. Then you can add 
> a collection with multiple shards, specifying how many shards you wish when 
> creating the collection, e.g.
> 
> bin/solr create -c mycoll -shards 4
> 
> Although possible, it is normally not advised to install multiple instances 
> of Solr on the same server.
> 
>> 2.      How many maximum number of shared can be added under on SOLRCloud?
> 
> There is no limit. You should find a good number based on the number of 
> documents, the size of your data, the number of servers in your cluster, 
> available RAM and disk size and the required performance.
> 
> In practice you will guess the initial #shards and then benchmark a few 
> different settings before you decide.
> Note that you can also adjust the number of shards as you go through 
> CREATESHARD / SPLITSHARD APIs, so even if you start out with few shards you 
> can grow later.
> 
>> 3.      In my application there is no need of ACID properties, other than
>> this can I use SOLR as a Complete Database?
> 
> You COULD, but Solr is not intended to be your primary data store. You should 
> always design your system so that you can re-index all content from some 
> source (does not need to be a database) when needed. There are several use 
> cases for a complete re-index that you should consider.
> 
>> 4.      In Which OS we can feel the better performance, Windows Server OS /
>> Linux?
> 
> I'd say Linux if you can. If you HAVE to, then you could also run on Windows 
> :-)
> 
>> 5.      If a SOLR Core contains 2 Billion indexes, what is the recommended
>> RAM size and Java heap space for better performance? 
> 
> It depends. It is not likely that you will ever put 2bn docs in one single 
> core. Normally you would have sharded long before that number.
> The amount of physical RAM and the amount of Java heap to allocate to Solr 
> must be calculated and decided on a per case basis.
> You could also benchmark this - test if a larger RAM size improves 
> performance due to caching. Depending on your bottlennecks, adding more RAM 
> may be a way to scale further before needing to add more servers.
> 
> Sounds like you should consult with a Solr expert to dive deep into your 
> exact usecase and architect the optimal setup for your case, if you have 
> these amounts of data.
> 
>> 6.      I have 20 fields per document, how many maximum number of documents
>> can be inserted / retrieved in a single request?
> 
> No limit. But there are practical limits.
> For indexing (update), attempt various batch sizes and find which gives the 
> best performance for you. It is just as important to do inserts (updates) in 
> many parallell connections as in large batches.
> 
> For searching, why would you want to know a maximum? Normally the usecase for 
> search is to get TOP N docs, not a maximum number?
> If you need to retrieve thousands of results, you should have a look at 
> /export handler and/or streaming expressions.
> 
>> 7.       If I have Billions of indexes, If the "start" parameter is 10th
>> Million index and "end" parameter is  start+100th index, for this case any
>> performance issue will be raised ?
> 
> Don't do it!
> This is a warning sign that you are using Solr in a wrong way.
> 
> If you need to scroll through all docs in the index, have a look at streaming 
> expressions or cursorMark instead!
> 
>> 8.      Which .net client is best for SOLR?
> 
> The only I'm aware of is SolrNET. There may be others. None of them are 
> supported by the Solr project.
> 
>> 9.      Is there any limitation for single field, I mean about the size for
>> blob data?
> 
> I think there is some default cutoff for very large values.
> 
> Why would you want to put very large blobs into documents?
> This is a warning flag that you may be using the search index in a wrong way. 
> Consider storing large blobs outside of the search index and reference them 
> from the docs.
> 
> 
> In general, it would help a lot if you start telling us WHAT you intend to 
> use Solr for, what you try to achieve, what performance goals/requirements 
> you have etc, instead of a lot of very specific max/min questions. There are 
> very seldom hard limits, and if there are, it is usually not a good idea to 
> approach them :)
> 
> Jan
>

Re: 20180917-Need Apache SOLR support

Reply via email to