Hello,

I recently ran into an out-of-memory error on a durable persistent cache I
set up a few weeks ago.  I have a single node, with durable persistence
enabled, as well as WAL archiving.  I'm running Ignite ver.
2.8.1#20200521-sha1:86422096.

I looked at the stack trace, but I couldn't get a clear fix on what part of
the system ran out of memory, or what parameters I should change to fix the
problem.  From what I could tell of the stack dump, it looks like the WAL
archive ran out of memory;  but the memory usage report that occurred just
a minute before the exception showed plenty of memory was available.

Can someone with more experience tuning Ignite memory point me towards the
configuration parameters I should adjust?  Below are my log and my
configuration.  ( I have read the wiki page on memory tuning, but I'm happy
to be referred back to it.)

The log, with the metrics right before the OOM exception, then the OOM
exception:

[2020-11-22T19:20:39,787][INFO ][grid-timeout-worker-#22][IgniteKernal]
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
    ^-- Node [id=2845fe3e, uptime=5 days, 15:08:38.033]
    ^-- Cluster [hosts=1, CPUs=4, servers=1, clients=0, topVer=1,
minorTopVer=1]
    ^-- Network [addrs=[0:0:0:0:0:0:0:1%lo, xxx.xxx.xxx.xxx, 127.0.0.1,
yyy.yyy.yyy.yyy], discoPort=47500, commPort=47100]
    ^-- CPU [CPUs=4, curLoad=0.33%, avgLoad=0.29%, GC=0%]
    ^-- Heap [used=316MB, free=62.34%, comm=812MB]
    ^-- Off-heap memory [used=4288MB, free=33.45%, allocated=6344MB]
    ^-- Page memory [pages=1085139]
    ^--   sysMemPlc region [type=internal, persistence=true,
lazyAlloc=false,
      ...  initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=99.99%,
allocRam=100MB, allocTotal=0MB]
    ^--   default_region region [type=default, persistence=true,
lazyAlloc=true,
      ...  initCfg=256MB, maxCfg=6144MB, usedRam=4288MB, freeRam=30.2%,
allocRam=6144MB, allocTotal=4240MB]
    ^--   metastoreMemPlc region [type=internal, persistence=true,
lazyAlloc=false,
      ...  initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=99.94%,
allocRam=0MB, allocTotal=0MB]
    ^--   TxLog region [type=internal, persistence=true, lazyAlloc=false,
      ...  initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=100%,
allocRam=100MB, allocTotal=0MB]
    ^-- Ignite persistence [used=4240MB]
    ^-- Outbound messages queue [size=0]
    ^-- Public thread pool [active=0, idle=0, qSize=0]
    ^-- System thread pool [active=0, idle=6, qSize=0]
[2020-11-22T19:21:15,585][ERROR][db-checkpoint-thread-#63][] Critical
system error detected. Will be handled accordingly to configured handler
[hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0,
super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet
[SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]],
failureCtx=FailureContext [type=CRITICAL_ERROR,
err=java.lang.OutOfMemoryError]]
java.lang.OutOfMemoryError: null
        at sun.misc.Unsafe.allocateMemory(Native Method) ~[?:1.8.0_121]
        at
org.apache.ignite.internal.util.GridUnsafe.allocateMemory(GridUnsafe.java:1205)
~[ignite-core-2.9.0.jar:2.9.0]
        at
org.apache.ignite.internal.util.GridUnsafe.allocateBuffer(GridUnsafe.java:264)
~[ignite-core-2.9.0.jar:2.9.0]
        at
org.apache.ignite.internal.processors.cache.persistence.wal.ByteBufferExpander.<init>(ByteBufferExpander.java:36)
~[ignite-core-2.9.0.jar:2.9.0]
        at
org.apache.ignite.internal.processors.cache.persistence.wal.AbstractWalRecordsIterator.<init>(AbstractWalRecordsIterator.java:125)
~[ignite-core-2.9.0.jar:2.9.0]
        at
org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$RecordsIterator.<init>(FileWriteAheadLogManager.java:2701)
~[ignite-core-2.9.0.jar:2.9.0]
        at
org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$RecordsIterator.<init>(FileWriteAheadLogManager.java:2637)
~[ignite-core-2.9.0.jar:2.9.0]
        at
org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.replay(FileWriteAheadLogManager.java:944)
~[ignite-core-2.9.0.jar:2.9.0]
        at
org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.replay(FileWriteAheadLogManager.java:920)
~[ignite-core-2.9.0.jar:2.9.0]
        at
org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointEntry$GroupStateLazyStore.initIfNeeded(CheckpointEntry.java:347)
~[ignite-core-2.9.0.jar:2.9.0]
        at
org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointEntry$GroupStateLazyStore.access$300(CheckpointEntry.java:243)
~[ignite-core-2.9.0.jar:2.9.0]
        at
org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointEntry.initIfNeeded(CheckpointEntry.java:122)
~[ignite-core-2.9.0.jar:2.9.0]
        at
org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointEntry.groupState(CheckpointEntry.java:104)
~[ignite-core-2.9.0.jar:2.9.0]
        at
org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointHistory.addCpToEarliestCpMap(CheckpointHistory.java:242)
~[ignite-core-2.9.0.jar:2.9.0]
        at
org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointHistory.addCheckpoint(CheckpointHistory.java:175)
~[ignite-core-2.9.0.jar:2.9.0]
        at
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$Checkpointer.markCheckpointBegin(GridCacheDatabaseSharedManager.java:3952)
~[ignite-core-2.9.0.jar:2.9.0]
        at
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$Checkpointer.doCheckpoint(GridCacheDatabaseSharedManager.java:3515)
~[ignite-core-2.9.0.jar:2.9.0]
        at
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$Checkpointer.body(GridCacheDatabaseSharedManager.java:3404)
[ignite-core-2.9.0.jar:2.9.0]
        at
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
[ignite-core-2.9.0.jar:2.9.0]
        at java.lang.Thread.run(Thread.java:745) [?:1.8.0_121]

My configuration:

<beans xmlns="http://www.springframework.org/schema/beans";
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
       xsi:schemaLocation="
       http://www.springframework.org/schema/beans
       http://www.springframework.org/schema/beans/spring-beans.xsd";>
    <bean id="ignite.cfg"
class="org.apache.ignite.configuration.IgniteConfiguration">
      <property name="gridLogger">
        <bean class="org.apache.ignite.logger.log4j2.Log4J2Logger">
          <constructor-arg type="java.lang.String"
value="/etc/apache-ignite/log4j2.xml"/>
        </bean>
      </property>
      <property name="dataStorageConfiguration">
        <bean
class="org.apache.ignite.configuration.DataStorageConfiguration">
          <property name="storagePath"
value="/data/ignite/persistent-cache"/>
          <property name="walPath" value="/data/ignite/wal"/>
          <property name="walArchivePath" value="/data/ignite/wal/archive"/>
          <property name="defaultDataRegionConfiguration">
            <bean
class="org.apache.ignite.configuration.DataRegionConfiguration">
              <property name="name" value="default_region"/>
              <property name="persistenceEnabled" value="true"/>
              <property name="maxSize" value="#{6L * 1024 * 1024 * 1024}"/>
            </bean>
          </property>
          <property name="pageSize" value="#{4 * 1024}"/>
          <property name="maxWalArchiveSize" value="#{100L * 1024 * 1024 *
1024}"/>
        </bean>
      </property>
      <property name="cacheConfiguration">
        <list>
          <bean class="org.apache.ignite.configuration.CacheConfiguration">
            <property name="name" value="default" />
            <property name="atomicityMode" value="ATOMIC" />
            <property name="backups" value="1"/>
            <property name="dataRegionName" value="default_region"/>
          </bean>
        </list>
      </property>
      <property name="discoverySpi">
        <bean class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">
          <property name="ipFinder">
            <bean
class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder">
              <property name="addresses">
                <list>
                  <value>127.0.0.1</value>
                </list>
              </property>
            </bean>
          </property>
        </bean>
      </property>
      <property name="communicationSpi">
        <bean
class="org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi">
          <property name="idleConnectionTimeout" value="60000"/>
        </bean>
      </property>
    </bean>
</beans>

Thanks in advance,

-- Scott

Reply via email to