Re: Binary recovery for a very long time

Ilya Kasnacheev Thu, 21 May 2020 05:49:19 -0700

Hello!

1. I guess that WAL is read.
2. Unfortunately we do not have truly graceful exit as far as my
understanding goes.


Regards,
-- 
Ilya Kasnacheev


вт, 19 мая 2020 г. в 10:22, 38797715 <38797...@qq.com>:

> Hi,
>
> the following log message:
>
> [2020-05-12T18:17:57,071][INFO ][main][GridCacheProcessor] Started cache
> in recovery mode [name=CO_CO_LINE_NEW, id=1742991829,
> dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1,
> mvcc=false]
>
> I have the following questions:
>
> 1.What has been done in the startup cache in recovery mode?
>
> 2.After testing, if the node stops normally (non abnormal shutdown), the
> recovery process will also be performed during startup. Why?
> 在 2020/5/18 下午9:58, Ilya Kasnacheev 写道:
>
> Hello!
>
> Direct IO module is experimental and should not be used unless performance
> is tested first, in your specific use case.
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> пн, 18 мая 2020 г. в 16:47, 38797715 <38797...@qq.com>:
>
>> Hi,
>>
>> If direct IO is disabled, the startup speed will be doubled, including
>> some other tests. I find that direct IO has a great impact on the read
>> performance.
>> 在 2020/5/14 上午5:16, Evgenii Zhuravlev 写道:
>>
>> Can you share full logs from all nodes?
>>
>> вт, 12 мая 2020 г. в 18:24, 38797715 <38797...@qq.com>:
>>
>>> Hi Evgenii,
>>>
>>> The storage used is not SSD.
>>>
>>> We will use different versions of ignite for further testing, such as
>>> ignite2.8.
>>> Ignite is configured as follows:
>>> <?xml version="1.0" encoding="UTF-8"?>
>>> <beans xmlns="http://www.springframework.org/schema/beans";
>>> <http://www.springframework.org/schema/beans>
>>> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
>>> <http://www.w3.org/2001/XMLSchema-instance> xsi:schemaLocation="
>>> http://www.springframework.org/schema/beans
>>> http://www.springframework.org/schema/beans/spring-beans.xsd";>
>>> <bean id="ignite.cfg" class=
>>> "org.apache.ignite.configuration.IgniteConfiguration">
>>> <property name="peerClassLoadingEnabled" value="true"/>
>>> <property name="consistentId" value="20"/>
>>> <property name="failureDetectionTimeout" value="120000"/>
>>> <property name="workDirectory" value="/appdata/ignite"/>
>>> <property name="rebalanceBatchSize" value="#{2 * 1024 * 1024}"/>
>>> <property name="rebalanceThrottle" value="100"/>
>>> <property name="rebalanceThreadPoolSize" value="4"/>
>>> <property name="gridLogger">
>>> <bean class="org.apache.ignite.logger.log4j2.Log4J2Logger">
>>> <constructor-arg type="java.lang.String" value=
>>> "config/ignite-log4j2.xml"/>
>>> </bean>
>>> </property>
>>> <property name="cacheConfiguration">
>>> <list>
>>> <bean id="partitioned-cache-template" abstract="true" class=
>>> "org.apache.ignite.configuration.CacheConfiguration">
>>> <property name="name" value="cache-partitioned*"/>
>>> <property name="cacheMode" value="PARTITIONED" />
>>> <property name="backups" value="1" />
>>> <property name="queryParallelism" value="16"/>
>>> <property name="partitionLossPolicy" value="READ_ONLY_SAFE"/>
>>> </bean>
>>> <bean id="replicated-cache-template" abstract="true" class=
>>> "org.apache.ignite.configuration.CacheConfiguration">
>>> <property name="name" value="cache-replicated*"/>
>>> <property name="cacheMode" value="REPLICATED" />
>>> <property name="partitionLossPolicy" value="READ_ONLY_SAFE"/>
>>> </bean>
>>> </list>
>>> </property>
>>> <!-- Enabling Apache Ignite Persistent Store. -->
>>> <property name="dataStorageConfiguration">
>>> <bean class="org.apache.ignite.configuration.DataStorageConfiguration">
>>> <property name="defaultDataRegionConfiguration">
>>> <bean class="org.apache.ignite.configuration.DataRegionConfiguration">
>>> <property name="persistenceEnabled" value="true"/>
>>> <property name="maxSize" value="#{200L * 1024 * 1024 * 1024}"/>
>>> </bean>
>>> </property>
>>> </bean>
>>> </property>
>>> </bean>
>>> </beans>
>>> 在 2020/5/13 上午4:45, Evgenii Zhuravlev 写道:
>>>
>>> Hi,
>>>
>>> Can you share full logs and configuration? What disk so you use?
>>>
>>> Evgenii
>>>
>>> вт, 12 мая 2020 г. в 06:49, 38797715 <38797...@qq.com>:
>>>
>>>> Among them:
>>>> CO_CO_NEW: ~ 48 minutes(partitioned,backup=1,33M)
>>>>
>>>> Ignite sys cache: ~ 27 minutes
>>>>
>>>> PLM_ITEM:~3 minutes(repicated,1.9K)
>>>>
>>>>
>>>> 在 2020/5/12 下午9:08, 38797715 写道:
>>>>
>>>> Hi community,
>>>>
>>>> We have 5 servers, 16 cores, 256g memory, and 200g off-heap memory.
>>>> We have 7 tables to test, and the data volume is
>>>> respectively:31.8M,495.2M,552.3M,33M,873.3K,28M,1.9K(replicated),others are
>>>> partitioned(backup = 1)
>>>>
>>>> VM args:-server -Xms20g -Xmx20g -XX:+AlwaysPreTouch -XX:+UseG1GC
>>>> -XX:+ScavengeBeforeFullGC -XX:+DisableExplicitGC -XX:+PrintGCDetails
>>>> -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation
>>>> -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=100M
>>>> -Xloggc:/data/gc/logs/gclog.txt -Djava.net.preferIPv4Stack=true
>>>> -XX:MaxDirectMemorySize=256M -XX:+PrintAdaptiveSizePolicy
>>>>
>>>> Today, one of the servers was restarted(kill and then start ignite.sh)
>>>> for some reason, but the node took 1.5 hours to start, which was much
>>>> longer than expected.
>>>>
>>>> After analyzing the log, the following information is found:
>>>> [2020-05-12T17:00:05,138][INFO ][main][GridCacheDatabaseSharedManager]
>>>> Found last checkpoint marker [cpId=7a0564f2-43e5-400b-9439-746fc68a6ccb,
>>>> pos=FileWALPointer [idx=10511, fileOff=51348888, len=61193]]
>>>> [2020-05-12T17:00:05,151][INFO ][main][GridCacheDatabaseSharedManager]
>>>> Binary memory state restored at node startup [restoredPtr=FileWALPointer
>>>> [idx=10511, fileOff=51410110, len=0]]
>>>> [2020-05-12T17:00:05,152][INFO ][main][FileWriteAheadLogManager]
>>>> Resuming logging to WAL segment [file=/appdata/ignite/db/wal/24/
>>>> 0000000000000001.wal, offset=51410110, ver=2]
>>>> [2020-05-12T17:00:06,448][INFO ][main][PageMemoryImpl] Started page
>>>> memory [memoryAllocated=200.0 GiB, pages=50821088, tableSize=3.9 GiB,
>>>> checkpointBuffer=2.0 GiB]
>>>> [2020-05-12T17:02:08,528][INFO ][main][GridCacheProcessor] Started
>>>> cache in recovery mode [name=CO_CO_NEW, id=-189779360,
>>>> dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1,
>>>> mvcc=false]
>>>> [2020-05-12T17:50:44,341][INFO ][main][GridCacheProcessor] Started
>>>> cache in recovery mode [name=CO_CO_LINE, id=-1588248812,
>>>> dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1,
>>>> mvcc=false]
>>>> [2020-05-12T17:50:44,366][INFO ][main][GridCacheProcessor] Started
>>>> cache in recovery mode [name=ignite-sys-cache, id=-2100569601,
>>>> dataRegionName=sysMemPlc, mode=REPLICATED, atomicity=TRANSACTIONAL, 
>>>> backups=
>>>> 2147483647, mvcc=false]
>>>> [2020-05-12T18:17:57,071][INFO ][main][GridCacheProcessor] Started
>>>> cache in recovery mode [name=CO_CO_LINE_NEW, id=1742991829,
>>>> dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1,
>>>> mvcc=false]
>>>> [2020-05-12T18:19:54,910][INFO ][main][GridCacheProcessor] Started
>>>> cache in recovery mode [name=PI_COM_DAY, id=-1904194728,
>>>> dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1,
>>>> mvcc=false]
>>>> [2020-05-12T18:19:54,949][INFO ][main][GridCacheProcessor] Started
>>>> cache in recovery mode [name=PLM_ITEM, id=-1283854143,
>>>> dataRegionName=default, mode=REPLICATED, atomicity=ATOMIC, backups=
>>>> 2147483647, mvcc=false]
>>>> [2020-05-12T18:22:53,662][INFO ][main][GridCacheProcessor] Started
>>>> cache in recovery mode [name=CO_CO, id=64322847,
>>>> dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1,
>>>> mvcc=false]
>>>> [2020-05-12T18:22:54,876][INFO ][main][GridCacheProcessor] Started
>>>> cache in recovery mode [name=CO_CUST, id=1684722246,
>>>> dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1,
>>>> mvcc=false]
>>>> [2020-05-12T18:22:54,892][INFO ][main][GridCacheDatabaseSharedManager]
>>>> Binary recovery performed in 4970233 ms.
>>>>
>>>> Among them, binary recovery took 4970 seconds.
>>>>
>>>> Our question is:
>>>>
>>>> 1.Why is the start time so long?
>>>>
>>>> 2.Is the current state of ignite, with the growth of single node data
>>>> volume, the restart time will be longer and longer?
>>>>
>>>> 3.Do have any suggestions for optimizing the restart time?
>>>>
>>>>

Re: Binary recovery for a very long time

Reply via email to