Re: 20180917-Need Apache SOLR support

Shawn Heisey Mon, 17 Sep 2018 09:04:39 -0700

On 9/17/2018 7:04 AM, KARTHICKRM wrote:

Dear SOLR Team,


We are beginners to Apache SOLR, We need following clarifications from you.

Much of what I'm going to say is a mirror of what you were already toldby Jan. All of Jan's responses are good.

1.      In SOLRCloud, How can we install more than one Shared on Single PC?

One Solr instance can run multiple indexes. Except for one specificscenario that I hope you don't run into, you should NOT run multipleSolr instances per server. There should only be one. If your queryrate is very low, then you can get good performance from multiple shardsper node, but with a high query rate, you'll only want one shard per node.

2.      How many maximum number of shared can be added under on SOLRCloud?

There is no practical limit. If you create enough of them (more than afew hundred), you can end up with severe scalability problems related toSolrCloud's interaction with ZooKeeper.

3.      In my application there is no need of ACID properties, other than
this can I use SOLR as a Complete Database?

Solr is NOT a database. All of its capability and all the optimizationsit contains are all geared towards search. If you try to use it as adatabase, you're going to be disappointed with it.

4.      In Which OS we can feel the better performance, Windows Server OS /
Linux?

From those two choices, I would strongly recommend Linux. If you havean open source operating system that you prefer to Linux, go with that.

5.      If a SOLR Core contains 2 Billion indexes, what is the recommended
RAM size and Java heap space for better performance?

I hope you mean 2 billion documents here, not 2 billion indexes. Eventhough technically speaking there's nothing preventing SolrCloud fromhandling that many indexes, you'll run into scalability problems longbefore you reach that many.

If you do mean documents ... don't put that many documents in one core. That number includes deleted documents, which means there's a goodpossibility of going beyond the actual limit if you try to have 2billion documents that haven't been deleted.

6.      I have 20 fields per document, how many maximum number of documents
can be inserted / retrieved in a single request?

There's no limit to the number that can be retrieved. But because theentire response must be built in memory, you can run your Solr installout of heap memory by trying to build a large response. Streamingexpressions can be used for really large results to avoid the memory issues.

As for the number of documents that can be inserted by a single request... Solr defaults to a maximum POST body size of 2 megabytes. This canbe increased through an option in solrconfig.xml. Unless your documentsare huge, this is usually enough to send several thousand at once, whichshould be plenty.

7.       If I have Billions of indexes, If the "start" parameter is 10th
Million index and "end" parameter is  start+100th index, for this case any
performance issue will be raised ?

Let's say that you send a request with these parameters, and the indexhas three shards:


start=10000000&rows=100

Every shard in the index is going to return a result to the coordinatingnode of ten million plus 100. That's thirty million individualresults. The coordinating node will combine those results, sort them,and then request full documents for the 100 specific rows that wererequested. This takes a lot of time and a lot of memory.

For deep paging, use cursorMark. For large result sets, use streamingexpressions. I have used cursorMark ... it's only disadvantage is thatyou can't jump straight to page 10000, you must go through all of theearlier pages too. But page 10000 will be just as fast as page 1. Ihave never used streaming expressions.

8.      Which .net client is best for SOLR?

No idea. The only client produced by this project is the Java client. All other clients are third-party, including .NET clients.

9.      Is there any limitation for single field, I mean about the size for
blob data?

There are technically no limitations here. But if your data is bigenough, it begins to cause scalability problems. It takes time to readdata off the disk, for the CPU to process it, etc.

In conclusion, I have much the same thing to say as Jan said. It soundsto me like you're not after a search engine, and that Solr might not bethe right product for what you're trying to accomplish. I'll say thisagain: Solr is NOT a database.


Thanks,
Shawn

Re: 20180917-Need Apache SOLR support

Reply via email to