Re[4]: Questions related to check pointing

Zhenya Stanilovsky Wed, 30 Dec 2020 03:49:08 -0800

Correct code is running from here:
if (checkpointReadWriteLock.getReadHoldCount() > 1 || 
safeToUpdatePageMemories() || checkpointer.runner() == null)
    break;
else {
    CheckpointProgress pages = checkpointer.scheduleCheckpoint(0, "too many 
dirty pages");
and near you can see that :


maxDirtyPages = throttlingPlc != ThrottlingPolicy.DISABLED
    ? pool.pages() * 3L / 4
    : Math.min(pool.pages() * 2L / 3, cpPoolPages);
Thus if ¾ pages are dirty from whole DataRegion pages — will raise this cp.
 
>In ( 
>https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Persistent+Store+-+under+the+hood
> ), there is a mention of a dirty pages limit that is a factor that can 
>trigger check points.
> 
>I also found this issue:  
>http://apache-ignite-users.70518.x6.nabble.com/too-many-dirty-pages-td28572.html
> where "too many dirty pages" is a reason given for initiating a checkpoint.
> 
>After reviewing our logs I found this: (one example)
> 
>2020-12-15 19:07:00,999 [106] INF [MutableCacheComputeServer] Checkpoint 
>started [checkpointId=e2c31b43-44df-43f1-b162-6b6cefa24e28, 
>startPtr=FileWALPointer [idx=6339, fileOff=243287334, len=196573], 
>checkpointBeforeLockTime=99ms, checkpointLockWait=0ms, 
>checkpointListenersExecuteTime=16ms, checkpointLockHoldTime=32ms, 
>walCpRecordFsyncDuration=113ms, writeCheckpointEntryDuration=27ms, 
>splitAndSortCpPagesDuration=45ms, pages=33421, reason=' too many dirty pages 
>']   
> 
>Which suggests we may have the issue where writes are frozen until the check 
>point is completed.
> 
>Looking at the AI 2.8.1 source code, the dirty page limit fraction appears to 
>be 0.1 (10%), via this entry in GridCacheDatabaseSharedManager.java:
> 
>    /**
>     * Threshold to calculate limit for pages list on-heap caches.
>     * <p>
>     * Note: When a checkpoint is triggered, we need some amount of page 
>memory to store pages list on-heap cache.
>     * If a checkpoint is triggered by "too many dirty pages" reason and pages 
>list cache is rather big, we can get
>* {@code IgniteOutOfMemoryException}. To prevent this, we can limit the total 
>amount of cached page list buckets,
>     * assuming that checkpoint will be triggered if no more then 3/4 of pages 
>will be marked as dirty (there will be
>     * at least 1/4 of clean pages) and each cached page list bucket can be 
>stored to up to 2 pages (this value is not
>     * static, but depends on PagesCache.MAX_SIZE, so if PagesCache.MAX_SIZE > 
>PagesListNodeIO#getCapacity it can take
>     * more than 2 pages). Also some amount of page memory needed to store 
>page list metadata.
>     */
>     private   static   final   double   PAGE_LIST_CACHE_LIMIT_THRESHOLD  =  
>0.1 ;
> 
>This raises two questions: 
> 
>1. The data region where most writes are occurring has 4Gb allocated to it, 
>though it is permitted to start at a much lower level. 4Gb should be 1,000,000 
>pages, 10% of which should be 100,000 dirty pages.
> 
>The 'limit holder' is calculated like this:
> 
>    /**
>     *  @return  Holder for page list cache limit for given data region.
>     */
>     public   AtomicLong   pageListCacheLimitHolder ( DataRegion   dataRegion 
>) {
>         if  ( dataRegion . config (). isPersistenceEnabled ()) {
>             return   pageListCacheLimits . computeIfAbsent ( dataRegion . 
>config (). getName (), name  ->   new   AtomicLong (
>                ( long )(((PageMemoryEx) dataRegion . pageMemory ()). 
>totalPages () * PAGE_LIST_CACHE_LIMIT_THRESHOLD)));
>        }  
>         return   null ;
>    }
> 
>... but I am unsure if totalPages() is referring to the current size of the 
>data region, or the size it is permitted to grow to. ie: Could the 'dirty page 
>limit' be a sliding limit based on the growth of the data region? Is it better 
>to set the initial and maximum sizes of data regions to be the same number?
> 
>2. We have two data regions, one supporting inbound arrival of data (with low 
>numbers of writes), and one supporting storage of processed results from the 
>arriving data (with many more writes). 
> 
>The block on writes due to the number of dirty pages appears to affect all 
>data regions, not just the one which has violated the dirty page limit. Is 
>that correct? If so, is this something that can be improved?
> 
>Thanks,
>Raymond.
>   
>On Wed, Dec 30, 2020 at 9:17 PM Raymond Wilson < raymond_wil...@trimble.com > 
>wrote:
>>I'm working on getting automatic JVM thread stack dumping occurring if we 
>>detect long delays in put (PutIfAbsent) operations. Hopefully this will 
>>provide more information.  
>>On Wed, Dec 30, 2020 at 7:48 PM Zhenya Stanilovsky < arzamas...@mail.ru > 
>>wrote:
>>>
>>>Don`t think so, checkpointing work perfectly well already before this fix.
>>>Need additional info for start digging your problem, can you share ignite 
>>>logs somewhere?
>>>   
>>>>I noticed an entry in the Ignite 2.9.1 changelog:
>>>>*  Improved checkpoint concurrent behaviour
>>>>I am having trouble finding the relevant Jira ticket for this in the 2.9.1 
>>>>Jira area at  
>>>>https://issues.apache.org/jira/browse/IGNITE-13876?jql=project%20%3D%20IGNITE%20AND%20fixVersion%20%3D%202.9.1%20and%20status%20%3D%20Resolved
>>>> 
>>>>Perhaps this change may improve the checkpointing issue we are seeing?
>>>> 
>>>>Raymond.
>>>>   
>>>>On Tue, Dec 29, 2020 at 8:35 PM Raymond Wilson < raymond_wil...@trimble.com 
>>>>> wrote:
>>>>>Hi Zhenya,
>>>>> 
>>>>>1. We currently use AWS EFS for primary storage, with provisioned IOPS to 
>>>>>provide sufficient IO. Our Ignite cluster currently tops out at ~10% usage 
>>>>>(with at least 5 nodes writing to it, including WAL and WAL archive), so 
>>>>>we are not saturating the EFS interface. We use the default page size 
>>>>>(experiments with larger page sizes showed instability when checkpointing 
>>>>>due to free page starvation, so we reverted to the default size). 
>>>>> 
>>>>>2. Thanks for the detail, we will look for that in thread dumps when we 
>>>>>can create them.
>>>>> 
>>>>>3. We are using the default CP buffer size, which is max(256Mb, 
>>>>>DataRagionSize / 4) according to the Ignite documentation, so this should 
>>>>>have more than enough checkpoint buffer space to cope with writes. As 
>>>>>additional information, the cache which is displaying very slow writes is 
>>>>>in a data region with relatively slow write traffic. There is a primary 
>>>>>(default) data region with large write traffic, and the vast majority of 
>>>>>pages being written in a checkpoint will be for that default data region.
>>>>> 
>>>>>4. Yes, this is very surprising. Anecdotally from our logs it appears 
>>>>>write traffic into the low write traffic cache is blocked during 
>>>>>checkpoints.
>>>>> 
>>>>>Thanks,
>>>>>Raymond.
>>>>>    
>>>>>   
>>>>>On Tue, Dec 29, 2020 at 7:31 PM Zhenya Stanilovsky < arzamas...@mail.ru > 
>>>>>wrote:
>>>>>>*  
>>>>>>Additionally to Ilya reply you can check vendors page for additional 
>>>>>>info, all in this page are applicable for ignite too [1]. Increasing 
>>>>>>threads number leads to concurrent io usage, thus if your have something 
>>>>>>like nvme — it`s up to you but in case of sas possibly better would be to 
>>>>>>reduce this param.
>>>>>>*  Log will shows you something like :
>>>>>>Parking thread=%Thread name% for timeout(ms)= %time% and appropriate :
>>>>>>Unparking thread=
>>>>>>*  No additional looging with cp buffer usage are provided. cp buffer 
>>>>>>need to be more than 10% of overall persistent  DataRegions size.
>>>>>>*  90 seconds or longer  —    Seems like problems in io or system tuning, 
>>>>>>it`s very bad score i hope. 
>>>>>>[1]  
>>>>>>https://www.gridgain.com/docs/latest/perf-troubleshooting-guide/persistence-tuning
>>>>>>
>>>>>>
>>>>>> 
>>>>>>>Hi,
>>>>>>> 
>>>>>>>We have been investigating some issues which appear to be related to 
>>>>>>>checkpointing. We currently use the IA 2.8.1 with the C# client.
>>>>>>> 
>>>>>>>I have been trying to gain clarity on how certain aspects of the Ignite 
>>>>>>>configuration relate to the checkpointing process:
>>>>>>> 
>>>>>>>1. Number of check pointing threads. This defaults to 4, but I don't 
>>>>>>>understand how it applies to the checkpointing process. Are more threads 
>>>>>>>generally better (eg: because it makes the disk IO parallel across the 
>>>>>>>threads), or does it only have a positive effect if you have many data 
>>>>>>>storage regions? Or something else? If this could be clarified in the 
>>>>>>>documentation (or a pointer to it which Google has not yet found), that 
>>>>>>>would be good.
>>>>>>> 
>>>>>>>2. Checkpoint frequency. This is defaulted to 180 seconds. I was 
>>>>>>>thinking that reducing this time would result in smaller less disruptive 
>>>>>>>check points. Setting it to 60 seconds seems pretty safe, but is there a 
>>>>>>>practical lower limit that should be used for use cases with new data 
>>>>>>>constantly being added, eg: 5 seconds, 10 seconds?
>>>>>>> 
>>>>>>>3. Write exclusivity constraints during checkpointing. I understand that 
>>>>>>>while a checkpoint is occurring ongoing writes will be supported into 
>>>>>>>the caches being check pointed, and if those are writes to existing 
>>>>>>>pages then those will be duplicated into the checkpoint buffer. If this 
>>>>>>>buffer becomes full or stressed then Ignite will throttle, and perhaps 
>>>>>>>block, writes until the checkpoint is complete. If this is the case then 
>>>>>>>Ignite will emit logging (warning or informational?) that writes are 
>>>>>>>being throttled.
>>>>>>> 
>>>>>>>We have cases where simple puts to caches (a few requests per second) 
>>>>>>>are taking up to 90 seconds to execute when there is an active check 
>>>>>>>point occurring, where the check point has been triggered by the 
>>>>>>>checkpoint timer. When a checkpoint is not occurring the time to do this 
>>>>>>>is usually in the milliseconds. The checkpoints themselves can take 90 
>>>>>>>seconds or longer, and are updating up to 30,000-40,000 pages, across a 
>>>>>>>pair of data storage regions, one with 4Gb in-memory space allocated 
>>>>>>>(which should be 1,000,000 pages at the standard 4kb page size), and one 
>>>>>>>small region with 128Mb. There is no 'throttling' logging being emitted 
>>>>>>>that we can tell, so the checkpoint buffer (which should be 1Gb for the 
>>>>>>>first data region and 256 Mb for the second smaller region in this case) 
>>>>>>>does not look like it can fill up during the checkpoint.
>>>>>>> 
>>>>>>>It seems like the checkpoint is affecting the put operations, but I 
>>>>>>>don't understand why that may be given the documented checkpointing 
>>>>>>>process, and the checkpoint itself (at least via Informational logging) 
>>>>>>>is not advertising any restrictions.
>>>>>>> 
>>>>>>>Thanks,
>>>>>>>Raymond.
>>>>>>>  --
>>>>>>>
>>>>>>>Raymond Wilson
>>>>>>>Solution Architect, Civil Construction Software Systems (CCSS)
>>>>>>>  
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>  
>>>>> 
>>>>>  --
>>>>>
>>>>>Raymond Wilson
>>>>>Solution Architect, Civil Construction Software Systems (CCSS)
>>>>>11 Birmingham Drive |  Christchurch, New Zealand
>>>>>+64-21-2013317  Mobile
>>>>>raymond_wil...@trimble.com
>>>>>         
>>>>> 
>>>> 
>>>>  --
>>>>
>>>>Raymond Wilson
>>>>Solution Architect, Civil Construction Software Systems (CCSS)
>>>>11 Birmingham Drive |  Christchurch, New Zealand
>>>>+64-21-2013317  Mobile
>>>>raymond_wil...@trimble.com
>>>>         
>>>> 
>>> 
>>> 
>>> 
>>>  
>> 
>>  --
>>
>>Raymond Wilson
>>Solution Architect, Civil Construction Software Systems (CCSS)
>>11 Birmingham Drive |  Christchurch, New Zealand
>>+64-21-2013317  Mobile
>>raymond_wil...@trimble.com
>>         
>> 
> 
>  --
>
>Raymond Wilson
>Solution Architect, Civil Construction Software Systems (CCSS)
>11 Birmingham Drive |  Christchurch, New Zealand
>+64-21-2013317  Mobile
>raymond_wil...@trimble.com
>         
>

Re[4]: Questions related to check pointing

Reply via email to