You pretty much have it. Actually, the number you want is the "maxDoc"
figure from the admin UI screen. The formula will be maxDoc/8 bytes +
(some overhead but not enough to matter), for EVERY entry.

You'll never fit 100B docs on a single machine anyway. Lucene has a
hard limit of 2B docs, and I've never heard of anyone fitting even 2B
docs on a single machine in a performant manner. So under any
circumstance this won't all be on one machine. You have to figure it
locally for each shard. And at this size there's no doubt you'll be
sharding!

Also be very careful here. the "size" parameter in the cache
definition is the number of _entries_, NOT the number of _bytes_.

_Each_ entry is that size! So the cache requirements will be close to
((maxDoc/8) + 128) * (size_defined_in_the_config_file), where 128 is
an approximation of the storage necessary for the text of the fq
clause.

Best,
Erick

On Wed, Jun 18, 2014 at 8:00 AM, Benjamin Wiens
<benjamin.wi...@gmail.com> wrote:
> Hi,
> I'm looking for a formula to calculate filterCache size in the RAM.
>
> The best estimation I can find is here
> http://stackoverflow.com/questions/20999904/solr-filter-cache-fastlrucache-takes-too-much-memory-and-results-in-out-of-mem
>
> An index of 1.000.000 would thus take 12,5 GB in the RAM with this formula:
>
> 100.000.000.000 bit / 8 (to byte) / 1000 (to kb) / 1000 (to mb) / 1000 (to
> gb) = 12,5 GB
>
> Can anyone confirm this formula? I am aware that if the result of the
> filter query is low, it can just create something else which take up less
> memory.
>
> I know I can just start with a low filterCache size and kick it up in my
> environment, but I'd like to come up with a scientific formula.
>
> Thanks,
> Ben

Reply via email to