On 9/7/2018 8:39 AM, Pavel Micka wrote:
I found on wiki (https://wiki.apache.org/solr/SolrPerformanceProblems#RAM) that 
optimal amount of RAM for SOLR is equal to index size. This is lets say the 
ideal case to have everything in memory.
I wrote that page.

We plan to have small installation with 2 nodes and 8shards. We'll have inside 
the cluster 100M of documents. We expect that each document will take 5kB to 
index. With in-memory index this would mean that those two nodes would require 
~500GB RAM. This would mean 2x 256GB to have everything in memory. And those 
are really big machines... Is this calculation even correct in new Solr 
versions?

And we do have a bit restricted problem: Our data are time based logs and we 
generally have a restricted search for last 3 months. Which will match let's 
say 10M of documents. How will this affect SOLR memory requirements? Will we 
still need to have the whole inverted indexes in memory? Or is there some 
internal optimization, which will ensure that only some part will need to be in 
memory?

The questions:

1)      Is the 500GB of memory reqs correct assumption?

There are two things that Solr needs memory for.  One is Solr's heap, which is memory directly used by Solr itself.  The other is unused memory, which the operating system will use to cache data on disk.  Solr performance is helped dramatically by the latter kind of memory.

For *OPTIMAL* performance with a 500GB index, you need 500GB of memory for the OS to cache the data.  This is memory that is not used by programs, including Solr's heap.

For *good* performance, it's rare that you will need enough memory to cache the entire index.  But I cannot tell you with any reliability how much of the index you must be able to cache.  Some people are doing fine with only a few percent of their index cached.  Others see terrible performance unless they can get 75 percent of the index cached.

2)      Will the fact that we have time-based logs with majority of accesses to 
recent data only help?

Yes, it most likely will help, and reduce your memory requirements.

3)      Is there some best practice how to reduce required RAM in Solr?

The biggest thing you can do is to reduce the size of the index, so there is less data that must be accessed for a query. The page you referenced lists some things you might be able to do to reduce Solr's heap requirements.  If you reduce the heap requirements, then more of the server's memory available for caching.

Thanks,
Shawn

Reply via email to