Re: Binary recovery for a very long time

38797715 Tue, 12 May 2020 18:24:18 -0700

Hi Evgenii,

The storage used is not SSD.

We will use different versions of ignite for further testing, such asignite2.8.

Ignite is configured as follows:

<?xmlversion="1.0"encoding="UTF-8"?>
<beansxmlns="http://www.springframework.org/schema/beans";
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://www.springframework.org/schema/beans
http://www.springframework.org/schema/beans/spring-beans.xsd";>
<beanid="ignite.cfg"class="org.apache.ignite.configuration.IgniteConfiguration">
<propertyname="peerClassLoadingEnabled"value="true"/>
<propertyname="consistentId"value="20"/>
<propertyname="failureDetectionTimeout"value="120000"/>
<propertyname="workDirectory"value="/appdata/ignite"/>
<propertyname="rebalanceBatchSize"value="#{2 * 1024 * 1024}"/>
<propertyname="rebalanceThrottle"value="100"/>
<propertyname="rebalanceThreadPoolSize"value="4"/>
<propertyname="gridLogger">
<beanclass="org.apache.ignite.logger.log4j2.Log4J2Logger">
<constructor-argtype="java.lang.String"value="config/ignite-log4j2.xml"/>
</bean>
</property>
<propertyname="cacheConfiguration">
<list>
<beanid="partitioned-cache-template"abstract="true"class="org.apache.ignite.configuration.CacheConfiguration">
<propertyname="name"value="cache-partitioned*"/>
<propertyname="cacheMode"value="PARTITIONED"/>
<propertyname="backups"value="1"/>
<propertyname="queryParallelism"value="16"/>
<propertyname="partitionLossPolicy"value="READ_ONLY_SAFE"/>
</bean>
<beanid="replicated-cache-template"abstract="true"class="org.apache.ignite.configuration.CacheConfiguration">
<propertyname="name"value="cache-replicated*"/>
<propertyname="cacheMode"value="REPLICATED"/>
<propertyname="partitionLossPolicy"value="READ_ONLY_SAFE"/>
</bean>
</list>
</property>
<!-- Enabling Apache Ignite Persistent Store. -->
<propertyname="dataStorageConfiguration">
<beanclass="org.apache.ignite.configuration.DataStorageConfiguration">
<propertyname="defaultDataRegionConfiguration">
<beanclass="org.apache.ignite.configuration.DataRegionConfiguration">
<propertyname="persistenceEnabled"value="true"/>
<propertyname="maxSize"value="#{200L * 1024 * 1024 * 1024}"/>
</bean>
</property>
</bean>
</property>
</bean>
</beans>
在 2020/5/13 上午4:45, Evgenii Zhuravlev 写道:

Hi,

Can you share full logs and configuration? What disk so you use?

Evgenii

вт, 12 мая 2020 г. в 06:49, 38797715 <38797...@qq.com<mailto:38797...@qq.com>>:


    Among them:
    CO_CO_NEW: ~ 48 minutes(partitioned,backup=1,33M)

    Ignite sys cache: ~ 27 minutes

    PLM_ITEM:~3 minutes(repicated,1.9K)


    在 2020/5/12 下午9:08, 38797715 写道:


    Hi community,

    We have 5 servers, 16 cores, 256g memory, and 200g off-heap memory.
    We have 7 tables to test, and the data volume is
    respectively:31.8M,495.2M,552.3M,33M,873.3K,28M,1.9K(replicated),others
    are partitioned(backup = 1)

    VM args:-server -Xms20g -Xmx20g -XX:+AlwaysPreTouch -XX:+UseG1GC
    -XX:+ScavengeBeforeFullGC -XX:+DisableExplicitGC
    -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps
    -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10
    -XX:GCLogFileSize=100M -Xloggc:/data/gc/logs/gclog.txt
    -Djava.net.preferIPv4Stack=true -XX:MaxDirectMemorySize=256M
    -XX:+PrintAdaptiveSizePolicy

    Today, one of the servers was restarted(kill and then start
    ignite.sh) for some reason, but the node took 1.5 hours to start,
    which was much longer than expected.

    After analyzing the log, the following information is found:

    [2020-05-12T17:00:05,138][INFO][main][GridCacheDatabaseSharedManager]
    Found last checkpoint marker
    [cpId=7a0564f2-43e5-400b-9439-746fc68a6ccb, pos=FileWALPointer
    [idx=10511, fileOff=51348888, len=61193]]
    [2020-05-12T17:00:05,151][INFO][main][GridCacheDatabaseSharedManager]
    Binary memory state restored at node startup
    [restoredPtr=FileWALPointer [idx=10511, fileOff=51410110, len=0]]
    [2020-05-12T17:00:05,152][INFO][main][FileWriteAheadLogManager]
    Resuming logging to WAL segment
    [file=/appdata/ignite/db/wal/24/0000000000000001.wal,
    offset=51410110, ver=2]
    [2020-05-12T17:00:06,448][INFO][main][PageMemoryImpl] Started
    page memory [memoryAllocated=200.0GiB, pages=50821088,
    tableSize=3.9GiB, checkpointBuffer=2.0GiB]
    [2020-05-12T17:02:08,528][INFO][main][GridCacheProcessor] Started
    cache in recovery mode [name=CO_CO_NEW, id=-189779360,
    dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC,
    backups=1, mvcc=false]
    [2020-05-12T17:50:44,341][INFO][main][GridCacheProcessor] Started
    cache in recovery mode [name=CO_CO_LINE, id=-1588248812,
    dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC,
    backups=1, mvcc=false]
    [2020-05-12T17:50:44,366][INFO][main][GridCacheProcessor] Started
    cache in recovery mode [name=ignite-sys-cache, id=-2100569601,
    dataRegionName=sysMemPlc, mode=REPLICATED,
    atomicity=TRANSACTIONAL, backups=2147483647, mvcc=false]
    [2020-05-12T18:17:57,071][INFO][main][GridCacheProcessor] Started
    cache in recovery mode [name=CO_CO_LINE_NEW, id=1742991829,
    dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC,
    backups=1, mvcc=false]
    [2020-05-12T18:19:54,910][INFO][main][GridCacheProcessor] Started
    cache in recovery mode [name=PI_COM_DAY, id=-1904194728,
    dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC,
    backups=1, mvcc=false]
    [2020-05-12T18:19:54,949][INFO][main][GridCacheProcessor] Started
    cache in recovery mode [name=PLM_ITEM, id=-1283854143,
    dataRegionName=default, mode=REPLICATED, atomicity=ATOMIC,
    backups=2147483647, mvcc=false]
    [2020-05-12T18:22:53,662][INFO][main][GridCacheProcessor] Started
    cache in recovery mode [name=CO_CO, id=64322847,
    dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC,
    backups=1, mvcc=false]
    [2020-05-12T18:22:54,876][INFO][main][GridCacheProcessor] Started
    cache in recovery mode [name=CO_CUST, id=1684722246,
    dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC,
    backups=1, mvcc=false]
    [2020-05-12T18:22:54,892][INFO][main][GridCacheDatabaseSharedManager]
    Binary recovery performed in 4970233ms.

    Among them, binary recovery took 4970 seconds.

    Our question is:

    1.Why is the start time so long?

    2.Is the current state of ignite, with the growth of single node
    data volume, the restart time will be longer and longer?

    3.Do have any suggestions for optimizing the restart time?

Re: Binary recovery for a very long time

Reply via email to