Hello! 1. I guess that WAL is read. 2. Unfortunately we do not have truly graceful exit as far as my understanding goes.
Regards, -- Ilya Kasnacheev вт, 19 мая 2020 г. в 10:22, 38797715 <38797...@qq.com>: > Hi, > > the following log message: > > [2020-05-12T18:17:57,071][INFO ][main][GridCacheProcessor] Started cache > in recovery mode [name=CO_CO_LINE_NEW, id=1742991829, > dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1, > mvcc=false] > > I have the following questions: > > 1.What has been done in the startup cache in recovery mode? > > 2.After testing, if the node stops normally (non abnormal shutdown), the > recovery process will also be performed during startup. Why? > 在 2020/5/18 下午9:58, Ilya Kasnacheev 写道: > > Hello! > > Direct IO module is experimental and should not be used unless performance > is tested first, in your specific use case. > > Regards, > -- > Ilya Kasnacheev > > > пн, 18 мая 2020 г. в 16:47, 38797715 <38797...@qq.com>: > >> Hi, >> >> If direct IO is disabled, the startup speed will be doubled, including >> some other tests. I find that direct IO has a great impact on the read >> performance. >> 在 2020/5/14 上午5:16, Evgenii Zhuravlev 写道: >> >> Can you share full logs from all nodes? >> >> вт, 12 мая 2020 г. в 18:24, 38797715 <38797...@qq.com>: >> >>> Hi Evgenii, >>> >>> The storage used is not SSD. >>> >>> We will use different versions of ignite for further testing, such as >>> ignite2.8. >>> Ignite is configured as follows: >>> <?xml version="1.0" encoding="UTF-8"?> >>> <beans xmlns="http://www.springframework.org/schema/beans" >>> <http://www.springframework.org/schema/beans> >>> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" >>> <http://www.w3.org/2001/XMLSchema-instance> xsi:schemaLocation=" >>> http://www.springframework.org/schema/beans >>> http://www.springframework.org/schema/beans/spring-beans.xsd"> >>> <bean id="ignite.cfg" class= >>> "org.apache.ignite.configuration.IgniteConfiguration"> >>> <property name="peerClassLoadingEnabled" value="true"/> >>> <property name="consistentId" value="20"/> >>> <property name="failureDetectionTimeout" value="120000"/> >>> <property name="workDirectory" value="/appdata/ignite"/> >>> <property name="rebalanceBatchSize" value="#{2 * 1024 * 1024}"/> >>> <property name="rebalanceThrottle" value="100"/> >>> <property name="rebalanceThreadPoolSize" value="4"/> >>> <property name="gridLogger"> >>> <bean class="org.apache.ignite.logger.log4j2.Log4J2Logger"> >>> <constructor-arg type="java.lang.String" value= >>> "config/ignite-log4j2.xml"/> >>> </bean> >>> </property> >>> <property name="cacheConfiguration"> >>> <list> >>> <bean id="partitioned-cache-template" abstract="true" class= >>> "org.apache.ignite.configuration.CacheConfiguration"> >>> <property name="name" value="cache-partitioned*"/> >>> <property name="cacheMode" value="PARTITIONED" /> >>> <property name="backups" value="1" /> >>> <property name="queryParallelism" value="16"/> >>> <property name="partitionLossPolicy" value="READ_ONLY_SAFE"/> >>> </bean> >>> <bean id="replicated-cache-template" abstract="true" class= >>> "org.apache.ignite.configuration.CacheConfiguration"> >>> <property name="name" value="cache-replicated*"/> >>> <property name="cacheMode" value="REPLICATED" /> >>> <property name="partitionLossPolicy" value="READ_ONLY_SAFE"/> >>> </bean> >>> </list> >>> </property> >>> <!-- Enabling Apache Ignite Persistent Store. --> >>> <property name="dataStorageConfiguration"> >>> <bean class="org.apache.ignite.configuration.DataStorageConfiguration"> >>> <property name="defaultDataRegionConfiguration"> >>> <bean class="org.apache.ignite.configuration.DataRegionConfiguration"> >>> <property name="persistenceEnabled" value="true"/> >>> <property name="maxSize" value="#{200L * 1024 * 1024 * 1024}"/> >>> </bean> >>> </property> >>> </bean> >>> </property> >>> </bean> >>> </beans> >>> 在 2020/5/13 上午4:45, Evgenii Zhuravlev 写道: >>> >>> Hi, >>> >>> Can you share full logs and configuration? What disk so you use? >>> >>> Evgenii >>> >>> вт, 12 мая 2020 г. в 06:49, 38797715 <38797...@qq.com>: >>> >>>> Among them: >>>> CO_CO_NEW: ~ 48 minutes(partitioned,backup=1,33M) >>>> >>>> Ignite sys cache: ~ 27 minutes >>>> >>>> PLM_ITEM:~3 minutes(repicated,1.9K) >>>> >>>> >>>> 在 2020/5/12 下午9:08, 38797715 写道: >>>> >>>> Hi community, >>>> >>>> We have 5 servers, 16 cores, 256g memory, and 200g off-heap memory. >>>> We have 7 tables to test, and the data volume is >>>> respectively:31.8M,495.2M,552.3M,33M,873.3K,28M,1.9K(replicated),others are >>>> partitioned(backup = 1) >>>> >>>> VM args:-server -Xms20g -Xmx20g -XX:+AlwaysPreTouch -XX:+UseG1GC >>>> -XX:+ScavengeBeforeFullGC -XX:+DisableExplicitGC -XX:+PrintGCDetails >>>> -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation >>>> -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=100M >>>> -Xloggc:/data/gc/logs/gclog.txt -Djava.net.preferIPv4Stack=true >>>> -XX:MaxDirectMemorySize=256M -XX:+PrintAdaptiveSizePolicy >>>> >>>> Today, one of the servers was restarted(kill and then start ignite.sh) >>>> for some reason, but the node took 1.5 hours to start, which was much >>>> longer than expected. >>>> >>>> After analyzing the log, the following information is found: >>>> [2020-05-12T17:00:05,138][INFO ][main][GridCacheDatabaseSharedManager] >>>> Found last checkpoint marker [cpId=7a0564f2-43e5-400b-9439-746fc68a6ccb, >>>> pos=FileWALPointer [idx=10511, fileOff=51348888, len=61193]] >>>> [2020-05-12T17:00:05,151][INFO ][main][GridCacheDatabaseSharedManager] >>>> Binary memory state restored at node startup [restoredPtr=FileWALPointer >>>> [idx=10511, fileOff=51410110, len=0]] >>>> [2020-05-12T17:00:05,152][INFO ][main][FileWriteAheadLogManager] >>>> Resuming logging to WAL segment [file=/appdata/ignite/db/wal/24/ >>>> 0000000000000001.wal, offset=51410110, ver=2] >>>> [2020-05-12T17:00:06,448][INFO ][main][PageMemoryImpl] Started page >>>> memory [memoryAllocated=200.0 GiB, pages=50821088, tableSize=3.9 GiB, >>>> checkpointBuffer=2.0 GiB] >>>> [2020-05-12T17:02:08,528][INFO ][main][GridCacheProcessor] Started >>>> cache in recovery mode [name=CO_CO_NEW, id=-189779360, >>>> dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1, >>>> mvcc=false] >>>> [2020-05-12T17:50:44,341][INFO ][main][GridCacheProcessor] Started >>>> cache in recovery mode [name=CO_CO_LINE, id=-1588248812, >>>> dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1, >>>> mvcc=false] >>>> [2020-05-12T17:50:44,366][INFO ][main][GridCacheProcessor] Started >>>> cache in recovery mode [name=ignite-sys-cache, id=-2100569601, >>>> dataRegionName=sysMemPlc, mode=REPLICATED, atomicity=TRANSACTIONAL, >>>> backups= >>>> 2147483647, mvcc=false] >>>> [2020-05-12T18:17:57,071][INFO ][main][GridCacheProcessor] Started >>>> cache in recovery mode [name=CO_CO_LINE_NEW, id=1742991829, >>>> dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1, >>>> mvcc=false] >>>> [2020-05-12T18:19:54,910][INFO ][main][GridCacheProcessor] Started >>>> cache in recovery mode [name=PI_COM_DAY, id=-1904194728, >>>> dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1, >>>> mvcc=false] >>>> [2020-05-12T18:19:54,949][INFO ][main][GridCacheProcessor] Started >>>> cache in recovery mode [name=PLM_ITEM, id=-1283854143, >>>> dataRegionName=default, mode=REPLICATED, atomicity=ATOMIC, backups= >>>> 2147483647, mvcc=false] >>>> [2020-05-12T18:22:53,662][INFO ][main][GridCacheProcessor] Started >>>> cache in recovery mode [name=CO_CO, id=64322847, >>>> dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1, >>>> mvcc=false] >>>> [2020-05-12T18:22:54,876][INFO ][main][GridCacheProcessor] Started >>>> cache in recovery mode [name=CO_CUST, id=1684722246, >>>> dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1, >>>> mvcc=false] >>>> [2020-05-12T18:22:54,892][INFO ][main][GridCacheDatabaseSharedManager] >>>> Binary recovery performed in 4970233 ms. >>>> >>>> Among them, binary recovery took 4970 seconds. >>>> >>>> Our question is: >>>> >>>> 1.Why is the start time so long? >>>> >>>> 2.Is the current state of ignite, with the growth of single node data >>>> volume, the restart time will be longer and longer? >>>> >>>> 3.Do have any suggestions for optimizing the restart time? >>>> >>>>