Re: Re[4]: Questions related to check pointing

Ilya Kasnacheev Thu, 07 Jan 2021 03:28:58 -0800

Hello!

I think it's a sensible explanation.


Regards,
-- 
Ilya Kasnacheev


ср, 6 янв. 2021 г. в 14:32, Raymond Wilson <raymond_wil...@trimble.com>:

> I checked our code that creates the primary data region, and it does set
> the minimum and maximum to 4Gb, meaning there will be 1,000,000 pages in
> that region.
>
> The secondary data region is much smaller, and is set to min/max = 128 Mb
> of memory.
>
> The checkpoints with the "too many dirty pages" reason were quoting less
> than 100,000 dirty pages, so this must have been triggered on the size of
> the smaller data region.
>
> Both these data regions have persistence, and I think this may have been a
> sub-optimal way to set it up. My aim was to provide a dedicated channel for
> inbound data arriving to be queued that was not impacted by updates due to
> processing of that data. I think it may be better to will change this
> arrangement to use a single data region to make the checkpointing process
> simpler and reduce cases where it decides there are too many dirty pages.
>
> On Mon, Jan 4, 2021 at 11:39 PM Ilya Kasnacheev <ilya.kasnach...@gmail.com>
> wrote:
>
>> Hello!
>>
>> I guess it's pool.pages() * 3L / 4
>> Since, counter intuitively, the default ThrottlingPolicy is not
>> ThrottlingPolicy.DISABLED. It's CHECKPOINT_BUFFER_ONLY.
>>
>> Regards,
>>
>> --
>> Ilya Kasnacheev
>>
>>
>> чт, 31 дек. 2020 г. в 04:33, Raymond Wilson <raymond_wil...@trimble.com>:
>>
>>> Regards this section of code:
>>>
>>>             maxDirtyPages = throttlingPlc != ThrottlingPolicy.DISABLED
>>>                 ? pool.pages() * 3L / 4
>>>                 : Math.min(pool.pages() * 2L / 3, cpPoolPages);
>>>
>>> I think the correct ratio will be 2/3 of pages as we do not have a
>>> throttling policy defined, correct?.
>>>
>>> On Thu, Dec 31, 2020 at 12:49 AM Zhenya Stanilovsky <arzamas...@mail.ru>
>>> wrote:
>>>
>>>> Correct code is running from here:
>>>>
>>>> if (checkpointReadWriteLock.getReadHoldCount() > 1 || 
>>>> safeToUpdatePageMemories() || checkpointer.runner() == null)
>>>>     break;else {
>>>>     CheckpointProgress pages = checkpointer.scheduleCheckpoint(0, "too 
>>>> many dirty pages");
>>>>
>>>> and near you can see that :
>>>> maxDirtyPages = throttlingPlc != ThrottlingPolicy.DISABLED    ? 
>>>> pool.pages() * 3L / 4    : Math.min(pool.pages() * 2L / 3, cpPoolPages);
>>>>
>>>> Thus if ¾ pages are dirty from whole DataRegion pages — will raise this
>>>> cp.
>>>>
>>>>
>>>> In (
>>>> https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Persistent+Store+-+under+the+hood),
>>>> there is a mention of a dirty pages limit that is a factor that can trigger
>>>> check points.
>>>>
>>>> I also found this issue:
>>>> http://apache-ignite-users.70518.x6.nabble.com/too-many-dirty-pages-td28572.html
>>>> where "too many dirty pages" is a reason given for initiating a checkpoint.
>>>>
>>>> After reviewing our logs I found this: (one example)
>>>>
>>>> 2020-12-15 19:07:00,999 [106] INF [MutableCacheComputeServer]
>>>> Checkpoint started [checkpointId=e2c31b43-44df-43f1-b162-6b6cefa24e28,
>>>> startPtr=FileWALPointer [idx=6339, fileOff=243287334, len=196573],
>>>> checkpointBeforeLockTime=99ms, checkpointLockWait=0ms,
>>>> checkpointListenersExecuteTime=16ms, checkpointLockHoldTime=32ms,
>>>> walCpRecordFsyncDuration=113ms, writeCheckpointEntryDuration=27ms,
>>>> splitAndSortCpPagesDuration=45ms, pages=33421, reason='too many dirty
>>>> pages']
>>>>
>>>> Which suggests we may have the issue where writes are frozen until the
>>>> check point is completed.
>>>>
>>>> Looking at the AI 2.8.1 source code, the dirty page limit fraction
>>>> appears to be 0.1 (10%), via this entry
>>>> in GridCacheDatabaseSharedManager.java:
>>>>
>>>>     /**
>>>>      * Threshold to calculate limit for pages list on-heap caches.
>>>>      * <p>
>>>>
>>>>      * Note: When a checkpoint is triggered, we need some amount of page 
>>>> memory to store pages list on-heap cache.
>>>>
>>>>      * If a checkpoint is triggered by "too many dirty pages" reason and 
>>>> pages list cache is rather big, we can get
>>>>
>>>> * {@code IgniteOutOfMemoryException}. To prevent this, we can limit the 
>>>> total amount of cached page list buckets,
>>>>
>>>>      * assuming that checkpoint will be triggered if no more then 3/4 of 
>>>> pages will be marked as dirty (there will be
>>>>
>>>>      * at least 1/4 of clean pages) and each cached page list bucket can 
>>>> be stored to up to 2 pages (this value is not
>>>>
>>>>      * static, but depends on PagesCache.MAX_SIZE, so if 
>>>> PagesCache.MAX_SIZE > PagesListNodeIO#getCapacity it can take
>>>>
>>>>      * more than 2 pages). Also some amount of page memory needed to store 
>>>> page list metadata.
>>>>      */
>>>>     private static final double PAGE_LIST_CACHE_LIMIT_THRESHOLD = 0.1;
>>>>
>>>> This raises two questions:
>>>>
>>>> 1. The data region where most writes are occurring has 4Gb allocated to
>>>> it, though it is permitted to start at a much lower level. 4Gb should be
>>>> 1,000,000 pages, 10% of which should be 100,000 dirty pages.
>>>>
>>>> The 'limit holder' is calculated like this:
>>>>
>>>>     /**
>>>>      * @return Holder for page list cache limit for given data region.
>>>>      */
>>>>     public AtomicLong pageListCacheLimitHolder(DataRegion dataRegion) {
>>>>         if (dataRegion.config().isPersistenceEnabled()) {
>>>>             return pageListCacheLimits.computeIfAbsent(dataRegion.
>>>> config().getName(), name -> new AtomicLong(
>>>>                 (long)(((PageMemoryEx)dataRegion.pageMemory()).
>>>> totalPages() * PAGE_LIST_CACHE_LIMIT_THRESHOLD)));
>>>>         }
>>>>
>>>>         return null;
>>>>     }
>>>>
>>>> ... but I am unsure if totalPages() is referring to the current size of
>>>> the data region, or the size it is permitted to grow to. ie: Could the
>>>> 'dirty page limit' be a sliding limit based on the growth of the data
>>>> region? Is it better to set the initial and maximum sizes of data regions
>>>> to be the same number?
>>>>
>>>> 2. We have two data regions, one supporting inbound arrival of data
>>>> (with low numbers of writes), and one supporting storage of processed
>>>> results from the arriving data (with many more writes).
>>>>
>>>> The block on writes due to the number of dirty pages appears to affect
>>>> all data regions, not just the one which has violated the dirty page limit.
>>>> Is that correct? If so, is this something that can be improved?
>>>>
>>>> Thanks,
>>>> Raymond.
>>>>
>>>>
>>>> On Wed, Dec 30, 2020 at 9:17 PM Raymond Wilson <
>>>> raymond_wil...@trimble.com
>>>> <//e.mail.ru/compose/?mailto=mailto%3araymond_wil...@trimble.com>>
>>>> wrote:
>>>>
>>>> I'm working on getting automatic JVM thread stack dumping occurring if
>>>> we detect long delays in put (PutIfAbsent) operations. Hopefully this will
>>>> provide more information.
>>>>
>>>> On Wed, Dec 30, 2020 at 7:48 PM Zhenya Stanilovsky <arzamas...@mail.ru
>>>> <//e.mail.ru/compose/?mailto=mailto%3aarzamas...@mail.ru>> wrote:
>>>>
>>>>
>>>> Don`t think so, checkpointing work perfectly well already before this
>>>> fix.
>>>> Need additional info for start digging your problem, can you share
>>>> ignite logs somewhere?
>>>>
>>>>
>>>>
>>>> I noticed an entry in the Ignite 2.9.1 changelog:
>>>>
>>>>    - Improved checkpoint concurrent behaviour
>>>>
>>>> I am having trouble finding the relevant Jira ticket for this in the
>>>> 2.9.1 Jira area at
>>>> https://issues.apache.org/jira/browse/IGNITE-13876?jql=project%20%3D%20IGNITE%20AND%20fixVersion%20%3D%202.9.1%20and%20status%20%3D%20Resolved
>>>>
>>>> Perhaps this change may improve the checkpointing issue we are seeing?
>>>>
>>>> Raymond.
>>>>
>>>>
>>>> On Tue, Dec 29, 2020 at 8:35 PM Raymond Wilson <
>>>> raymond_wil...@trimble.com
>>>> <http://e.mail.ru/compose/?mailto=mailto%3araymond_wil...@trimble.com>>
>>>> wrote:
>>>>
>>>> Hi Zhenya,
>>>>
>>>> 1. We currently use AWS EFS for primary storage, with provisioned IOPS
>>>> to provide sufficient IO. Our Ignite cluster currently tops out at ~10%
>>>> usage (with at least 5 nodes writing to it, including WAL and WAL archive),
>>>> so we are not saturating the EFS interface. We use the default page size
>>>> (experiments with larger page sizes showed instability when checkpointing
>>>> due to free page starvation, so we reverted to the default size).
>>>>
>>>> 2. Thanks for the detail, we will look for that in thread dumps when we
>>>> can create them.
>>>>
>>>> 3. We are using the default CP buffer size, which is max(256Mb,
>>>> DataRagionSize / 4) according to the Ignite documentation, so this should
>>>> have more than enough checkpoint buffer space to cope with writes. As
>>>> additional information, the cache which is displaying very slow writes is
>>>> in a data region with relatively slow write traffic. There is a primary
>>>> (default) data region with large write traffic, and the vast majority of
>>>> pages being written in a checkpoint will be for that default data region.
>>>>
>>>> 4. Yes, this is very surprising. Anecdotally from our logs it appears
>>>> write traffic into the low write traffic cache is blocked during
>>>> checkpoints.
>>>>
>>>> Thanks,
>>>> Raymond.
>>>>
>>>>
>>>>
>>>> On Tue, Dec 29, 2020 at 7:31 PM Zhenya Stanilovsky <arzamas...@mail.ru
>>>> <http://e.mail.ru/compose/?mailto=mailto%3aarzamas...@mail.ru>> wrote:
>>>>
>>>>
>>>>    1. Additionally to Ilya reply you can check vendors page for
>>>>    additional info, all in this page are applicable for ignite too [1].
>>>>    Increasing threads number leads to concurrent io usage, thus if your 
>>>> have
>>>>    something like nvme — it`s up to you but in case of sas possibly better
>>>>    would be to reduce this param.
>>>>    2. Log will shows you something like :
>>>>
>>>>    Parking thread=%Thread name% for timeout(ms)= %time%
>>>>
>>>>    and appropriate :
>>>>
>>>>    Unparking thread=
>>>>
>>>>    3. No additional looging with cp buffer usage are provided. cp
>>>>    buffer need to be more than 10% of overall persistent  DataRegions size.
>>>>    4. 90 seconds or longer —  Seems like problems in io or system
>>>>    tuning, it`s very bad score i hope.
>>>>
>>>> [1]
>>>> https://www.gridgain.com/docs/latest/perf-troubleshooting-guide/persistence-tuning
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Hi,
>>>>
>>>> We have been investigating some issues which appear to be related to
>>>> checkpointing. We currently use the IA 2.8.1 with the C# client.
>>>>
>>>> I have been trying to gain clarity on how certain aspects of the Ignite
>>>> configuration relate to the checkpointing process:
>>>>
>>>> 1. Number of check pointing threads. This defaults to 4, but I don't
>>>> understand how it applies to the checkpointing process. Are more threads
>>>> generally better (eg: because it makes the disk IO parallel across the
>>>> threads), or does it only have a positive effect if you have many data
>>>> storage regions? Or something else? If this could be clarified in the
>>>> documentation (or a pointer to it which Google has not yet found), that
>>>> would be good.
>>>>
>>>> 2. Checkpoint frequency. This is defaulted to 180 seconds. I was
>>>> thinking that reducing this time would result in smaller
>>>> less disruptive check points. Setting it to 60 seconds seems pretty
>>>> safe, but is there a practical lower limit that should be used for use
>>>> cases with new data constantly being added, eg: 5 seconds, 10 seconds?
>>>>
>>>> 3. Write exclusivity constraints during checkpointing. I understand
>>>> that while a checkpoint is occurring ongoing writes will be supported into
>>>> the caches being check pointed, and if those are writes to existing pages
>>>> then those will be duplicated into the checkpoint buffer. If this buffer
>>>> becomes full or stressed then Ignite will throttle, and perhaps block,
>>>> writes until the checkpoint is complete. If this is the case then Ignite
>>>> will emit logging (warning or informational?) that writes are being
>>>> throttled.
>>>>
>>>> We have cases where simple puts to caches (a few requests per second)
>>>> are taking up to 90 seconds to execute when there is an active check point
>>>> occurring, where the check point has been triggered by the checkpoint
>>>> timer. When a checkpoint is not occurring the time to do this is usually in
>>>> the milliseconds. The checkpoints themselves can take 90 seconds or longer,
>>>> and are updating up to 30,000-40,000 pages, across a pair of data storage
>>>> regions, one with 4Gb in-memory space allocated (which should be 1,000,000
>>>> pages at the standard 4kb page size), and one small region with 128Mb.
>>>> There is no 'throttling' logging being emitted that we can tell, so the
>>>> checkpoint buffer (which should be 1Gb for the first data region and 256 Mb
>>>> for the second smaller region in this case) does not look like it can fill
>>>> up during the checkpoint.
>>>>
>>>> It seems like the checkpoint is affecting the put operations, but I
>>>> don't understand why that may be given the documented checkpointing
>>>> process, and the checkpoint itself (at least via Informational logging) is
>>>> not advertising any restrictions.
>>>>
>>>> Thanks,
>>>> Raymond.
>>>>
>>>> --
>>>> <http://www.trimble.com/>
>>>> Raymond Wilson
>>>> Solution Architect, Civil Construction Software Systems (CCSS)
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> <http://www.trimble.com/>
>>>> Raymond Wilson
>>>> Solution Architect, Civil Construction Software Systems (CCSS)
>>>> 11 Birmingham Drive | Christchurch, New Zealand
>>>> +64-21-2013317 Mobile
>>>> raymond_wil...@trimble.com
>>>> <http://e.mail.ru/compose/?mailto=mailto%3araymond_wil...@trimble.com>
>>>>
>>>>
>>>>
>>>> <https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch>
>>>>
>>>>
>>>>
>>>> --
>>>> <http://www.trimble.com/>
>>>> Raymond Wilson
>>>> Solution Architect, Civil Construction Software Systems (CCSS)
>>>> 11 Birmingham Drive | Christchurch, New Zealand
>>>> +64-21-2013317 Mobile
>>>> raymond_wil...@trimble.com
>>>> <http://e.mail.ru/compose/?mailto=mailto%3araymond_wil...@trimble.com>
>>>>
>>>>
>>>>
>>>> <https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> <http://www.trimble.com/>
>>>> Raymond Wilson
>>>> Solution Architect, Civil Construction Software Systems (CCSS)
>>>> 11 Birmingham Drive | Christchurch, New Zealand
>>>> +64-21-2013317 Mobile
>>>> raymond_wil...@trimble.com
>>>> <//e.mail.ru/compose/?mailto=mailto%3araymond_wil...@trimble.com>
>>>>
>>>>
>>>>
>>>> <https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch>
>>>>
>>>>
>>>>
>>>> --
>>>> <http://www.trimble.com/>
>>>> Raymond Wilson
>>>> Solution Architect, Civil Construction Software Systems (CCSS)
>>>> 11 Birmingham Drive | Christchurch, New Zealand
>>>> +64-21-2013317 Mobile
>>>> raymond_wil...@trimble.com
>>>> <//e.mail.ru/compose/?mailto=mailto%3araymond_wil...@trimble.com>
>>>>
>>>>
>>>>
>>>> <https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> <http://www.trimble.com/>
>>> Raymond Wilson
>>> Solution Architect, Civil Construction Software Systems (CCSS)
>>> 11 Birmingham Drive | Christchurch, New Zealand
>>> +64-21-2013317 Mobile
>>> raymond_wil...@trimble.com
>>>
>>>
>>> <https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch>
>>>
>>
>
> --
> <http://www.trimble.com/>
> Raymond Wilson
> Solution Architect, Civil Construction Software Systems (CCSS)
> 11 Birmingham Drive | Christchurch, New Zealand
> raymond_wil...@trimble.com
>
>
> <https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch>
>

Re: Re[4]: Questions related to check pointing

Reply via email to