Hello, I recently ran into an out-of-memory error on a durable persistent cache I set up a few weeks ago. I have a single node, with durable persistence enabled, as well as WAL archiving. I'm running Ignite ver. 2.8.1#20200521-sha1:86422096.
I looked at the stack trace, but I couldn't get a clear fix on what part of the system ran out of memory, or what parameters I should change to fix the problem. From what I could tell of the stack dump, it looks like the WAL archive ran out of memory; but the memory usage report that occurred just a minute before the exception showed plenty of memory was available. Can someone with more experience tuning Ignite memory point me towards the configuration parameters I should adjust? Below are my log and my configuration. ( I have read the wiki page on memory tuning, but I'm happy to be referred back to it.) The log, with the metrics right before the OOM exception, then the OOM exception: [2020-11-22T19:20:39,787][INFO ][grid-timeout-worker-#22][IgniteKernal] Metrics for local node (to disable set 'metricsLogFrequency' to 0) ^-- Node [id=2845fe3e, uptime=5 days, 15:08:38.033] ^-- Cluster [hosts=1, CPUs=4, servers=1, clients=0, topVer=1, minorTopVer=1] ^-- Network [addrs=[0:0:0:0:0:0:0:1%lo, xxx.xxx.xxx.xxx, 127.0.0.1, yyy.yyy.yyy.yyy], discoPort=47500, commPort=47100] ^-- CPU [CPUs=4, curLoad=0.33%, avgLoad=0.29%, GC=0%] ^-- Heap [used=316MB, free=62.34%, comm=812MB] ^-- Off-heap memory [used=4288MB, free=33.45%, allocated=6344MB] ^-- Page memory [pages=1085139] ^-- sysMemPlc region [type=internal, persistence=true, lazyAlloc=false, ... initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=99.99%, allocRam=100MB, allocTotal=0MB] ^-- default_region region [type=default, persistence=true, lazyAlloc=true, ... initCfg=256MB, maxCfg=6144MB, usedRam=4288MB, freeRam=30.2%, allocRam=6144MB, allocTotal=4240MB] ^-- metastoreMemPlc region [type=internal, persistence=true, lazyAlloc=false, ... initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=99.94%, allocRam=0MB, allocTotal=0MB] ^-- TxLog region [type=internal, persistence=true, lazyAlloc=false, ... initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=100%, allocRam=100MB, allocTotal=0MB] ^-- Ignite persistence [used=4240MB] ^-- Outbound messages queue [size=0] ^-- Public thread pool [active=0, idle=0, qSize=0] ^-- System thread pool [active=0, idle=6, qSize=0] [2020-11-22T19:21:15,585][ERROR][db-checkpoint-thread-#63][] Critical system error detected. Will be handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=CRITICAL_ERROR, err=java.lang.OutOfMemoryError]] java.lang.OutOfMemoryError: null at sun.misc.Unsafe.allocateMemory(Native Method) ~[?:1.8.0_121] at org.apache.ignite.internal.util.GridUnsafe.allocateMemory(GridUnsafe.java:1205) ~[ignite-core-2.9.0.jar:2.9.0] at org.apache.ignite.internal.util.GridUnsafe.allocateBuffer(GridUnsafe.java:264) ~[ignite-core-2.9.0.jar:2.9.0] at org.apache.ignite.internal.processors.cache.persistence.wal.ByteBufferExpander.<init>(ByteBufferExpander.java:36) ~[ignite-core-2.9.0.jar:2.9.0] at org.apache.ignite.internal.processors.cache.persistence.wal.AbstractWalRecordsIterator.<init>(AbstractWalRecordsIterator.java:125) ~[ignite-core-2.9.0.jar:2.9.0] at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$RecordsIterator.<init>(FileWriteAheadLogManager.java:2701) ~[ignite-core-2.9.0.jar:2.9.0] at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$RecordsIterator.<init>(FileWriteAheadLogManager.java:2637) ~[ignite-core-2.9.0.jar:2.9.0] at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.replay(FileWriteAheadLogManager.java:944) ~[ignite-core-2.9.0.jar:2.9.0] at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.replay(FileWriteAheadLogManager.java:920) ~[ignite-core-2.9.0.jar:2.9.0] at org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointEntry$GroupStateLazyStore.initIfNeeded(CheckpointEntry.java:347) ~[ignite-core-2.9.0.jar:2.9.0] at org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointEntry$GroupStateLazyStore.access$300(CheckpointEntry.java:243) ~[ignite-core-2.9.0.jar:2.9.0] at org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointEntry.initIfNeeded(CheckpointEntry.java:122) ~[ignite-core-2.9.0.jar:2.9.0] at org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointEntry.groupState(CheckpointEntry.java:104) ~[ignite-core-2.9.0.jar:2.9.0] at org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointHistory.addCpToEarliestCpMap(CheckpointHistory.java:242) ~[ignite-core-2.9.0.jar:2.9.0] at org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointHistory.addCheckpoint(CheckpointHistory.java:175) ~[ignite-core-2.9.0.jar:2.9.0] at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$Checkpointer.markCheckpointBegin(GridCacheDatabaseSharedManager.java:3952) ~[ignite-core-2.9.0.jar:2.9.0] at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$Checkpointer.doCheckpoint(GridCacheDatabaseSharedManager.java:3515) ~[ignite-core-2.9.0.jar:2.9.0] at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$Checkpointer.body(GridCacheDatabaseSharedManager.java:3404) [ignite-core-2.9.0.jar:2.9.0] at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) [ignite-core-2.9.0.jar:2.9.0] at java.lang.Thread.run(Thread.java:745) [?:1.8.0_121] My configuration: <beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation=" http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd"> <bean id="ignite.cfg" class="org.apache.ignite.configuration.IgniteConfiguration"> <property name="gridLogger"> <bean class="org.apache.ignite.logger.log4j2.Log4J2Logger"> <constructor-arg type="java.lang.String" value="/etc/apache-ignite/log4j2.xml"/> </bean> </property> <property name="dataStorageConfiguration"> <bean class="org.apache.ignite.configuration.DataStorageConfiguration"> <property name="storagePath" value="/data/ignite/persistent-cache"/> <property name="walPath" value="/data/ignite/wal"/> <property name="walArchivePath" value="/data/ignite/wal/archive"/> <property name="defaultDataRegionConfiguration"> <bean class="org.apache.ignite.configuration.DataRegionConfiguration"> <property name="name" value="default_region"/> <property name="persistenceEnabled" value="true"/> <property name="maxSize" value="#{6L * 1024 * 1024 * 1024}"/> </bean> </property> <property name="pageSize" value="#{4 * 1024}"/> <property name="maxWalArchiveSize" value="#{100L * 1024 * 1024 * 1024}"/> </bean> </property> <property name="cacheConfiguration"> <list> <bean class="org.apache.ignite.configuration.CacheConfiguration"> <property name="name" value="default" /> <property name="atomicityMode" value="ATOMIC" /> <property name="backups" value="1"/> <property name="dataRegionName" value="default_region"/> </bean> </list> </property> <property name="discoverySpi"> <bean class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi"> <property name="ipFinder"> <bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder"> <property name="addresses"> <list> <value>127.0.0.1</value> </list> </property> </bean> </property> </bean> </property> <property name="communicationSpi"> <bean class="org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi"> <property name="idleConnectionTimeout" value="60000"/> </bean> </property> </bean> </beans> Thanks in advance, -- Scott