Hi Zhenya,

Thanks for the pointers - I will look into them.

I have been doing some additional reading into this and discovered we are
using a 4.0 NFS client, which seems to be the first 'no-no'; we will look
at updating to use the 41 NFS client.

We have modified our default timer cadence for checkpointing from 3 minutes
to 1 minutes, which seems to be giving us better performance. We will
continue to measure the impact that has.

Lastly, I'm planning to merge our two data regions into a single region to
reduce 'too many dirty pages' checkpoints due to high write activity in
a small region.

Would using larger pages sizes (eg: 16kb) be useful with EFS?

Raymond.

On Tue, Jan 12, 2021 at 8:27 PM Zhenya Stanilovsky <arzamas...@mail.ru>
wrote:

> hope it would be helpful too:
>
> https://www.jeffgeerling.com/blog/2018/getting-best-performance-out-amazon-efs
> https://docs.aws.amazon.com/efs/latest/ug/storage-classes.html
>
>
> Hi Zhenya,
>
> The matching checkpoint finished log is this:
>
> 2020-12-15 19:07:39,253 [106] INF [MutableCacheComputeServer] Checkpoint
> finished [cpId=e2c31b43-44df-43f1-b162-6b6cefa24e28, pages=33421,
> markPos=FileWALPointer [idx=6339, fileOff=243287334, len=196573],
> walSegmentsCleared=0, walSegmentsCovered=[], markDuration=218ms,
> pagesWrite=1150ms, fsync=37104ms, total=38571ms]
>
> Regards your comment that 3/4 of pages in whole data region need to be
> dirty to trigger this, can you confirm this is 3/4 of the maximum size of
> the data region, or of the currently used size (eg: if Min is 1Gb, and Max
> is 4Gb, and used is 2Gb, would 1.5Gb of dirty pages trigger this?)
>
> Are data regions independently checkpointed, or are they checkpointed as a
> whole, so that a 'too many dirty pages' condition affects all data regions
> in terms of write blocking?
>
> Can you comment on my query regarding should we set Min and Max size of
> the data region to be the same? Ie: Don't bother with growing the data
> region memory use on demand, just allocate the maximum?
>
> In terms of the checkpoint lock hold time metric, of the checkpoints
> quoting 'too many dirty pages' there is one instance apart from the one I
> have provided earlier violating this limit, ie:
>
> 2020-12-17 18:56:39,086 [104] INF [MutableCacheComputeServer] Checkpoint
> started [checkpointId=e9ccf0ca-f813-4f91-ac93-5483350fdf66,
> startPtr=FileWALPointer [idx=7164, fileOff=389224517, len=196573],
> checkpointBeforeLockTime=276ms, checkpointLockWait=0ms,
> checkpointListenersExecuteTime=16ms, checkpointLockHoldTime=39ms,
> walCpRecordFsyncDuration=254ms, writeCheckpointEntryDuration=32ms,
> splitAndSortCpPagesDuration=276ms, pages=77774, reason='too many dirty
> pages']
>
> This is out of a population of 16 instances I can find. The remainder have
> lock times of 16-17ms.
>
> Regarding writes of pages to the persistent store, does the check pointing
> system parallelise writes across partitions ro maximise throughput?
>
> Thanks,
> Raymond.
>
>
>
> On Thu, Dec 31, 2020 at 1:17 AM Zhenya Stanilovsky <arzamas...@mail.ru
> <//e.mail.ru/compose/?mailto=mailto%3aarzamas...@mail.ru>> wrote:
>
>
> All write operations will be blocked for this timeout : 
> checkpointLockHoldTime=32ms
> (Write Lock holding) If you observe huge amount of such messages :
> reason='too many dirty pages' may be you need to store some data in not
> persisted regions for example or reduce indexes (if you use them). And
> please attach other part of cp message starting with : Checkpoint finished.
>
>
>
>
> In (
> https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Persistent+Store+-+under+the+hood),
> there is a mention of a dirty pages limit that is a factor that can trigger
> check points.
>
> I also found this issue:
> http://apache-ignite-users.70518.x6.nabble.com/too-many-dirty-pages-td28572.html
> where "too many dirty pages" is a reason given for initiating a checkpoint.
>
> After reviewing our logs I found this: (one example)
>
> 2020-12-15 19:07:00,999 [106] INF [MutableCacheComputeServer] Checkpoint
> started [checkpointId=e2c31b43-44df-43f1-b162-6b6cefa24e28,
> startPtr=FileWALPointer [idx=6339, fileOff=243287334, len=196573],
> checkpointBeforeLockTime=99ms, checkpointLockWait=0ms,
> checkpointListenersExecuteTime=16ms, checkpointLockHoldTime=32ms,
> walCpRecordFsyncDuration=113ms, writeCheckpointEntryDuration=27ms,
> splitAndSortCpPagesDuration=45ms, pages=33421, reason='too many dirty
> pages']
>
> Which suggests we may have the issue where writes are frozen until the
> check point is completed.
>
> Looking at the AI 2.8.1 source code, the dirty page limit fraction appears
> to be 0.1 (10%), via this entry in GridCacheDatabaseSharedManager.java:
>
>     /**
>      * Threshold to calculate limit for pages list on-heap caches.
>      * <p>
>
>      * Note: When a checkpoint is triggered, we need some amount of page 
> memory to store pages list on-heap cache.
>
>      * If a checkpoint is triggered by "too many dirty pages" reason and 
> pages list cache is rather big, we can get
>
> * {@code IgniteOutOfMemoryException}. To prevent this, we can limit the total 
> amount of cached page list buckets,
>
>      * assuming that checkpoint will be triggered if no more then 3/4 of 
> pages will be marked as dirty (there will be
>
>      * at least 1/4 of clean pages) and each cached page list bucket can be 
> stored to up to 2 pages (this value is not
>
>      * static, but depends on PagesCache.MAX_SIZE, so if PagesCache.MAX_SIZE 
> > PagesListNodeIO#getCapacity it can take
>
>      * more than 2 pages). Also some amount of page memory needed to store 
> page list metadata.
>      */
>     private static final double PAGE_LIST_CACHE_LIMIT_THRESHOLD = 0.1;
>
> This raises two questions:
>
> 1. The data region where most writes are occurring has 4Gb allocated to
> it, though it is permitted to start at a much lower level. 4Gb should be
> 1,000,000 pages, 10% of which should be 100,000 dirty pages.
>
> The 'limit holder' is calculated like this:
>
>     /**
>      * @return Holder for page list cache limit for given data region.
>      */
>     public AtomicLong pageListCacheLimitHolder(DataRegion dataRegion) {
>         if (dataRegion.config().isPersistenceEnabled()) {
>             return pageListCacheLimits.computeIfAbsent(dataRegion.config
> ().getName(), name -> new AtomicLong(
>                 (long)(((PageMemoryEx)dataRegion.pageMemory()).totalPages
> () * PAGE_LIST_CACHE_LIMIT_THRESHOLD)));
>         }
>
>         return null;
>     }
>
> ... but I am unsure if totalPages() is referring to the current size of
> the data region, or the size it is permitted to grow to. ie: Could the
> 'dirty page limit' be a sliding limit based on the growth of the data
> region? Is it better to set the initial and maximum sizes of data regions
> to be the same number?
>
> 2. We have two data regions, one supporting inbound arrival of data (with
> low numbers of writes), and one supporting storage of processed results
> from the arriving data (with many more writes).
>
> The block on writes due to the number of dirty pages appears to affect all
> data regions, not just the one which has violated the dirty page limit. Is
> that correct? If so, is this something that can be improved?
>
> Thanks,
> Raymond.
>
>
> On Wed, Dec 30, 2020 at 9:17 PM Raymond Wilson <raymond_wil...@trimble.com
> <http://e.mail.ru/compose/?mailto=mailto%3araymond_wil...@trimble.com>>
> wrote:
>
> I'm working on getting automatic JVM thread stack dumping occurring if we
> detect long delays in put (PutIfAbsent) operations. Hopefully this will
> provide more information.
>
> On Wed, Dec 30, 2020 at 7:48 PM Zhenya Stanilovsky <arzamas...@mail.ru
> <http://e.mail.ru/compose/?mailto=mailto%3aarzamas...@mail.ru>> wrote:
>
>
> Don`t think so, checkpointing work perfectly well already before this fix.
> Need additional info for start digging your problem, can you share ignite
> logs somewhere?
>
>
>
> I noticed an entry in the Ignite 2.9.1 changelog:
>
>    - Improved checkpoint concurrent behaviour
>
> I am having trouble finding the relevant Jira ticket for this in the 2.9.1
> Jira area at
> https://issues.apache.org/jira/browse/IGNITE-13876?jql=project%20%3D%20IGNITE%20AND%20fixVersion%20%3D%202.9.1%20and%20status%20%3D%20Resolved
>
> Perhaps this change may improve the checkpointing issue we are seeing?
>
> Raymond.
>
>
> On Tue, Dec 29, 2020 at 8:35 PM Raymond Wilson <raymond_wil...@trimble.com
> <http://e.mail.ru/compose/?mailto=mailto%3araymond_wil...@trimble.com>>
> wrote:
>
> Hi Zhenya,
>
> 1. We currently use AWS EFS for primary storage, with provisioned IOPS to
> provide sufficient IO. Our Ignite cluster currently tops out at ~10% usage
> (with at least 5 nodes writing to it, including WAL and WAL archive), so we
> are not saturating the EFS interface. We use the default page size
> (experiments with larger page sizes showed instability when checkpointing
> due to free page starvation, so we reverted to the default size).
>
> 2. Thanks for the detail, we will look for that in thread dumps when we
> can create them.
>
> 3. We are using the default CP buffer size, which is max(256Mb,
> DataRagionSize / 4) according to the Ignite documentation, so this should
> have more than enough checkpoint buffer space to cope with writes. As
> additional information, the cache which is displaying very slow writes is
> in a data region with relatively slow write traffic. There is a primary
> (default) data region with large write traffic, and the vast majority of
> pages being written in a checkpoint will be for that default data region.
>
> 4. Yes, this is very surprising. Anecdotally from our logs it appears
> write traffic into the low write traffic cache is blocked during
> checkpoints.
>
> Thanks,
> Raymond.
>
>
>
> On Tue, Dec 29, 2020 at 7:31 PM Zhenya Stanilovsky <arzamas...@mail.ru
> <http://e.mail.ru/compose/?mailto=mailto%3aarzamas...@mail.ru>> wrote:
>
>
>    1. Additionally to Ilya reply you can check vendors page for
>    additional info, all in this page are applicable for ignite too [1].
>    Increasing threads number leads to concurrent io usage, thus if your have
>    something like nvme — it`s up to you but in case of sas possibly better
>    would be to reduce this param.
>    2. Log will shows you something like :
>
>    Parking thread=%Thread name% for timeout(ms)= %time%
>
>    and appropriate :
>
>    Unparking thread=
>
>    3. No additional looging with cp buffer usage are provided. cp buffer
>    need to be more than 10% of overall persistent  DataRegions size.
>    4. 90 seconds or longer —  Seems like problems in io or system tuning,
>    it`s very bad score i hope.
>
> [1]
> https://www.gridgain.com/docs/latest/perf-troubleshooting-guide/persistence-tuning
>
>
>
>
>
> Hi,
>
> We have been investigating some issues which appear to be related to
> checkpointing. We currently use the IA 2.8.1 with the C# client.
>
> I have been trying to gain clarity on how certain aspects of the Ignite
> configuration relate to the checkpointing process:
>
> 1. Number of check pointing threads. This defaults to 4, but I don't
> understand how it applies to the checkpointing process. Are more threads
> generally better (eg: because it makes the disk IO parallel across the
> threads), or does it only have a positive effect if you have many data
> storage regions? Or something else? If this could be clarified in the
> documentation (or a pointer to it which Google has not yet found), that
> would be good.
>
> 2. Checkpoint frequency. This is defaulted to 180 seconds. I was thinking
> that reducing this time would result in smaller less disruptive check
> points. Setting it to 60 seconds seems pretty safe, but is there a
> practical lower limit that should be used for use cases with new data
> constantly being added, eg: 5 seconds, 10 seconds?
>
> 3. Write exclusivity constraints during checkpointing. I understand that
> while a checkpoint is occurring ongoing writes will be supported into the
> caches being check pointed, and if those are writes to existing pages then
> those will be duplicated into the checkpoint buffer. If this buffer becomes
> full or stressed then Ignite will throttle, and perhaps block, writes until
> the checkpoint is complete. If this is the case then Ignite will emit
> logging (warning or informational?) that writes are being throttled.
>
> We have cases where simple puts to caches (a few requests per second) are
> taking up to 90 seconds to execute when there is an active check point
> occurring, where the check point has been triggered by the checkpoint
> timer. When a checkpoint is not occurring the time to do this is usually in
> the milliseconds. The checkpoints themselves can take 90 seconds or longer,
> and are updating up to 30,000-40,000 pages, across a pair of data storage
> regions, one with 4Gb in-memory space allocated (which should be 1,000,000
> pages at the standard 4kb page size), and one small region with 128Mb.
> There is no 'throttling' logging being emitted that we can tell, so the
> checkpoint buffer (which should be 1Gb for the first data region and 256 Mb
> for the second smaller region in this case) does not look like it can fill
> up during the checkpoint.
>
> It seems like the checkpoint is affecting the put operations, but I don't
> understand why that may be given the documented checkpointing process, and
> the checkpoint itself (at least via Informational logging) is not
> advertising any restrictions.
>
> Thanks,
> Raymond.
>
> --
> <http://www.trimble.com/>
> Raymond Wilson
> Solution Architect, Civil Construction Software Systems (CCSS)
>
>
>
>
>
>
>
>
>
> --
> <http://www.trimble.com/>
> Raymond Wilson
> Solution Architect, Civil Construction Software Systems (CCSS)
> 11 Birmingham Drive | Christchurch, New Zealand
> +64-21-2013317 Mobile
> raymond_wil...@trimble.com
> <http://e.mail.ru/compose/?mailto=mailto%3araymond_wil...@trimble.com>
>
>
>
> <https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch>
>
>
>
> --
> <http://www.trimble.com/>
> Raymond Wilson
> Solution Architect, Civil Construction Software Systems (CCSS)
> 11 Birmingham Drive | Christchurch, New Zealand
> +64-21-2013317 Mobile
> raymond_wil...@trimble.com
> <http://e.mail.ru/compose/?mailto=mailto%3araymond_wil...@trimble.com>
>
>
>
> <https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch>
>
>
>
>
>
>
>
>
> --
> <http://www.trimble.com/>
> Raymond Wilson
> Solution Architect, Civil Construction Software Systems (CCSS)
> 11 Birmingham Drive | Christchurch, New Zealand
> +64-21-2013317 Mobile
> raymond_wil...@trimble.com
> <http://e.mail.ru/compose/?mailto=mailto%3araymond_wil...@trimble.com>
>
>
>
> <https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch>
>
>
>
> --
> <http://www.trimble.com/>
> Raymond Wilson
> Solution Architect, Civil Construction Software Systems (CCSS)
> 11 Birmingham Drive | Christchurch, New Zealand
> +64-21-2013317 Mobile
> raymond_wil...@trimble.com
> <http://e.mail.ru/compose/?mailto=mailto%3araymond_wil...@trimble.com>
>
>
>
> <https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch>
>
>
>
>
>
>
>
>
> --
> <http://www.trimble.com/>
> Raymond Wilson
> Solution Architect, Civil Construction Software Systems (CCSS)
> 11 Birmingham Drive | Christchurch, New Zealand
> +64-21-2013317 Mobile
> raymond_wil...@trimble.com
> <//e.mail.ru/compose/?mailto=mailto%3araymond_wil...@trimble.com>
>
>
>
> <https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch>
>
>
>
>
>
>
>
>
>


-- 
<http://www.trimble.com/>
Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
raymond_wil...@trimble.com

<https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch>

Reply via email to