Re: Recurring OutOfMemoryError in SolrCloud 8.11.1 - 16GB allocation failure

Vincenzo D'Amore Thu, 31 Jul 2025 01:57:43 -0700

Hi Antonio,

Passing &facet.limit=Integer.MAX_VALUE or rows=Integer.MAX_VALUE might be
the root of the issue you're encountering.


What’s likely happening is that the Java Virtual Machine (JVM), upon
receiving such large parameter values, attempts to allocate an enormous
amount of memory. This can lead to significant memory fragmentation, making
it difficult for the garbage collector to function efficiently. As a
result, overall performance may degrade or the system may become unstable.

I've run into this problem multiple times with SolrCloud, where it often
resulted in recurring OutOfMemoryError exceptions.



On Thu, Jul 31, 2025 at 10:37 AM Antonio Nunziante <[email protected]>
wrote:

> Dear Solr Community,
>
>
>
> I'm running Solr 8.11.1 in SolrCloud mode (3 nodes, 44GB heap each), and
> I'm
> investigating a critical OutOfMemoryError.
>
>
>
> The GC logs show Solr attempting to allocate an object of 17179868936
> bytes,
> that I suspect it corresponds to an object of size 2147483617 on a 64-bit
> JVM (with 8-byte word size) 2147483617 * 8 = 17179868936 bytes.
>
> In java Integer.MAX_INT is 2147483647, and the value 2147483617 is just a
> little below Integer.MAX_INT. This corresponds to "MAX_ARRAY_LENGTH" the
> maximum safe Java array size internally used by Lucene's
> ArrayUtil.oversize(), and probably also in some other places in Solr source
> code.
>
>
>
>   /** Maximum length for an array (Integer.MAX_VALUE -
> RamUsageEstimator.NUM_BYTES_ARRAY_HEADER). */
>
>   public static final int MAX_ARRAY_LENGTH = Integer.MAX_VALUE -
> RamUsageEstimator.NUM_BYTES_ARRAY_HEADER;
>
>
>
> This leads to a 16GB allocation attempt on 64-bit JVMs, which is what
> eventually triggers the OOM, more or less once per day on each node. Often
> when one nodes restart, also the other 3 restart.
>
>
>
> Some details about our setup:
>
> *       Linux Red Hat version 8.4
> *       OpenJDK 64-Bit Server VM (build 21+35-2513)
> *       Solr 8.11.1, 3 nodes, 44GB heap over 64GB of total RAM (4GB of
> swap)
> *       G1GC, default parameters (-XX:+AlwaysPreTouch
> -XX:+ExplicitGCInvokesConcurrent -XX:+ParallelRefProcEnabled
> -XX:+PerfDisableSharedMem -XX:+UseG1GC -XX:+UseLargePages
> -XX:-OmitStackTraceInFastThrow -XX:MaxGCPauseMillis=250)
>
>
>
> Solr contains 42 collections, 3 shards and 3 replicas each:
>
> *       21 collections are kept empty and are used as support when
> re-indexing (we index from scratch on an empty collection and then swap it
> with the current active one by modifying aliases, the old one then is
> emptied)
> *       21 collections contains documents, but of these only 6 are the most
> used for the main search requests:
>
> *       number of documents per collection ranges from 20k to 70k
> *       average document size ranges from 10Kb to 50Kb
> *       on disk the biggest shard is 750MB, and each node has a total of
> 5GB
> or 6GB size on disk
> *       We have lots of dynamic fields (like *_s, *_b, *_d, etc.), each of
> these 6 collections has from 20k to 40k of these fields
>
>
>
> Requests are around 1000 per minute, mostly edismax queries, usually
> retrieving less than 50 documents with 20 to 30 fields, some facets (around
> 30 different fields), and some filters. Also sorting on a couple of fields
> (around 10 different fields).
>
> Some of these requests send parameter &facet.limit as Integer.MAX_VALUE,
> but
> if this was the problem the OOMs should happen every minute, and it's not
> our case (by the way, we are fixing this by forcing -1 instead of
> MAX_VALUE).
>
>
>
> Here is the relevant solr_gc.log extract:
>
> [2025-07-30T07:56:35.977+0200][38364.361s] GC(688) Eden regions: 0->0(407)
>
> [2025-07-30T07:56:35.977+0200][38364.361s] GC(688) Survivor regions:
> 0->0(87)
>
> [2025-07-30T07:56:35.977+0200][38364.361s] GC(688) Old regions: 424->418
>
> [2025-07-30T07:56:35.977+0200][38364.361s] GC(688) Humongous regions:
> 512->512
>
> [2025-07-30T07:56:35.977+0200][38364.361s] GC(688) Metaspace:
> 99070K(100224K)->99002K(100224K) NonClass: 88782K(89472K)->88726K(89472K)
> Class: 10287K(10752K)->10276K(10752K)
>
> [2025-07-30T07:56:35.977+0200][38364.361s] GC(688) Heap after GC
> invocations=689 (full 2):
>
> [2025-07-30T07:56:35.977+0200][38364.361s] GC(688)  garbage-first heap
> total 46137344K, used 30443099K [0x00007f5e8c000000, 0x00007f698c000000)
>
> [2025-07-30T07:56:35.977+0200][38364.361s] GC(688)   region size 32768K, 0
> young (0K), 0 survivors (0K)
>
> [2025-07-30T07:56:35.977+0200][38364.361s] GC(688)  Metaspace       used
> 99002K, committed 100224K, reserved 1179648K
>
> [2025-07-30T07:56:35.977+0200][38364.361s] GC(688)   class space    used
> 10276K, committed 10752K, reserved 1048576K
>
> [2025-07-30T07:56:35.977+0200][38364.361s] GC(688) Pause Full (G1
> Compaction
> Pause) 29774M->29729M(45056M) 7798.255ms
>
> [2025-07-30T07:56:35.977+0200][38364.361s] GC(688) User=47.22s Sys=0.05s
> Real=7.80s
>
> [2025-07-30T07:56:35.977+0200][38364.361s] Attempt heap expansion
> (allocation request failed). Allocation request: 17179868936B
>
> [2025-07-30T07:56:35.977+0200][38364.361s] Expand the heap. requested
> expansion amount: 17179868936B expansion amount: 17179869184B
>
> [2025-07-30T07:56:35.977+0200][38364.361s] Did not expand the heap (heap
> already fully expanded)
>
> [2025-07-30T07:56:35.979+0200][38364.363s] G1 Service Thread (Card Set Free
> Memory Task) (run: 14470.631ms) (cpu: 0.642ms)
>
> [2025-07-30T07:56:35.979+0200][38364.363s] G1 Service Thread (Periodic GC
> Task) (run 14061.096ms after schedule)
>
> [2025-07-30T07:56:35.979+0200][38364.363s] G1 Service Thread (Periodic GC
> Task) (run: 0.010ms) (cpu: 0.000ms)
>
> [2025-07-30T07:56:35.989+0200][38364.373s] G1 Service Thread (Card Set Free
> Memory Task) (run 0.202ms after schedule)
>
>
>
> I need help with identifying what part of Solr or Lucene could be
> responsible for that allocation. We do not have millions of facet terms (i
> think max should be in the thousands) or unusually large result sets.
>
> If anyone can help in pointing to known causes, relevant classes, or
> previous similar issues, it would be greatly appreciated.
>
>
>
> Thanks,
>
> Antonio
>
>
>
>

-- 
Vincenzo D'Amore

Re: Recurring OutOfMemoryError in SolrCloud 8.11.1 - 16GB allocation failure

Reply via email to