Shawn,

On 7/5/22 23:52, Shawn Heisey wrote:
On 7/5/2022 3:11 PM, Christopher Schultz wrote:
Well, if you need more than 32GiB, I think the recommendation is to go MUCH HIGHER than 32GiB. If you have a 48GiB machine, maybe restrict to 31GiB of heap, but if you have a TiB, go for it :)

I remember reading somewhere, likely for a different program than Solr, that the observed break-even point for 64-bit pointers was 46GB.  The level of debugging and introspection required to calculate that number would be VERY extensive.  Most Solr installs can get by with a max heap size of 31GB or less, even if they are quite large.  For those that need more, I would probably want to see a heap size of at least 64GB.  It is probably better to use SolrCloud and split the index across more servers to keep the heap requirement low than to use a really massive heap.

This is why I said "uhh..." above: the JVM needs more memory than the heap. Sometimes as much as twice that amount, depending upon the workload of the application itself. Measure, measure, measure.

It would be interesting to see how much overhead there really is for Solr with various index sizes.  We have seen people have OOM problems when making *only* GC changes ... switching from CMS to G1.  Solr has used G1 out of the box for a while now.

Anecdotal data point:

Solr 7.7.3
Oracle Java 1.8.0_312
Xms = Xmx = 1024M
No messing with default GC or other memory settings
1 Core, no ZK
30s autocommit

On-disk artifact size:
$ du -hs /path/to/core
723M    /path/to/core

Live memory info:

Solr self-reported heap memory used: 205.12 MB [*]
I reloaded the admin page after writing the "*" note below and it's reporting 55.78 MB heap used.

Using 'ps' to report real memory usage:

$ ps aux | grep '\(java\|PID\)'
USER       PID %CPU %MEM    VSZ   RSS     [...]
solr     20324  8.1  0.7 6928440 469496   [...]

So the process space is 6.6G (my 'ps' reports VSZ in kilobytes) and the resident size (aka "actual memory use") is ~460M.

Solr doesn't report the high-water mark for its heap usage, but the most I've seen so far without a GC kicking it back down is ~200M. So there looks to be about 100% overhead based upon the max heap size.

I see lots of memory mapped files (both JAR libraries and index-related files) when I do:

$ sudo lsof -p 20324

So I suspect a lot of those are mapped-into that resident process space. mmap is one of those things that eats-up tons of non-heap space and doesn't count toward that Xms/Xmx limit. Probably why people run out of memory so frequently because they think they can allocate huge amounts of heap space on their big machine when they really need native memory and not quite so much heap.

[*] I recently restarted Solr because my personal TLS client key had expired; I had to mint a new one and install it. I'd really love to know if Solr/Jetty can re-load its TLS configuration without restarting. It's a real drag to bounce Solr for something so mundane.

I'm in interested to know what the relation is between on-disk index side and in-memory index size. I would imagine that the on-disk artifacts are fairly slim (only storing what is necessary) and the in-memory representation has all kinds of "waste" (like pointers and all that). Has anyone done a back-of-the-napkin calculation to guess at the in-memory size of an index given the on-disk representation?

That is an interesting question.  One of the reasons Lucene queries so fast when there is plenty of memory is because it accesses files on disk directly with MMAP, so there is no need to copy the really massive data structures into the heap at all.

This is likely where lots of that RSS space is being used in my process detailed above.

I believe the OP is having problems because they need a total memory size far larger than 64GB to handle 500GB of index data, and they should also have dedicated hardware for Solr so there is no competition with other software for scarce system resources.

Having never come close to busting my heap with my tiny 500M (on-disk) index, I'm curious about Solr's expected performance with a huge index and small memory. Will Solr just "get by with what it has" or will it really crap itself if the index is too big? I was kinda hoping it would just perform awfully because it has to keep going back to the disk.

-chris

Reply via email to