Hi,
We have a 4 node cluster with 64G RAM and 40G DISK per node attached for
/work and /walarchive each.  WAL dir is 10G per node

Below is our data region configuration and jvm_opts. We are getting
timeouts on checkpointing and WalArchive is getting filled up and no data
is moving to the /work directory. Checkpointing error is mentioned below

 could you suggest what's wrong with these configs(Note : with
walarchveSize as 16G and checkpointingBUffSize as 4G, it was writing but
very slow and throttling rate was 35% so I increased it, but it started
showing timeout errors)

JVM_OPTS:     -XX:MaxDirectMemorySize=2g -Xms20g -Xmx25g
-XX:+AlwaysPreTouch -XX:+UseG1GC -XX:+ScavengeBeforeFullGC
-XX:+DisableExplicitGC

Ignite config:

<property name="dataStorageConfiguration">
            <bean
class="org.apache.ignite.configuration.DataStorageConfiguration">
                <property name="walBufferSize" value="#{256L * 1024 *
1024}"/>
                <property name="checkpointFrequency" value="30000"/>
                <property name="checkpointThreads" value="12"/>
                <property name="walSegmentSize" value="#{512L * 1024 *
1024}"/>
                <property name="maxWalArchiveSize" value="#{32L * 1024 *
1024 * 1024}"/>
                <property name="writeThrottlingEnabled" value="true"/>
                <property name="defaultDataRegionConfiguration">
                    <bean
class="org.apache.ignite.configuration.DataRegionConfiguration">
                        <property name="persistenceEnabled" value="true"/>
                        <!--
https://ignite.apache.org/docs/latest/persistence/persistence-tuning#adjusting-checkpointing-buffer-size--
>
                        <property name="checkpointPageBufferSize"
value="#{8L * 1024 * 1024 * 1024}"/>
                        <!--<property name="pageReplacementMode"
value="SEGMENTED_LRU"/>-->
                    </bean>
                </property>

                <property name="walPath" value="/ignite/wal"/>
                <property name="walArchivePath" value="/ignite/walarchive"/>
            </bean>

        </property>


Checkpointing logs:

[14:56:55,209][INFO][db-checkpoint-thread-#105][Checkpointer] Checkpoint
started [checkpointId=4b071279-8f25-4cd0-a5ef-e6f58f3a5653,
startPtr=WALPointer [idx=930, fileOff=527943956, len=51411],
checkpointBeforeLockTime=9ms, checkpointLockWait=0ms,
checkpointListenersExecuteTime=6ms, checkpointLockHoldTime=9ms,
walCpRecordFsyncDuration=7ms, writeCheckpointEntryDuration=3ms,
splitAndSortCpPagesDuration=45ms, pages=35542, reason='timeout']
[14:56:55,921][INFO][db-checkpoint-thread-#105][Checkpointer] Checkpoint
finished [cpId=4b071279-8f25-4cd0-a5ef-e6f58f3a5653, pages=35542,
markPos=WALPointer [idx=930, fileOff=527943956, len=51411],
walSegmentsCovered=[], markDuration=64ms, pagesWrite=108ms, fsync=604ms,
total=785ms]

Reply via email to