As another detail, we have the WriteThrottlingEnabled property left at its default value of 'false', so I would not ordinarily expect throttling, correct?
On Tue, Dec 29, 2020 at 10:04 AM Raymond Wilson <raymond_wil...@trimble.com> wrote: > Hi Ilya, > > Regarding the throttling question, I have not yet looked at thread dumps - > the observed behaviour has been seen in production metrics and logging. > What would you expect a thread dump to show in this case? > > Given my description of the sizes of the data regions and the numbers of > pages being updated in a checkpoint would you expect any > throttling behaviour? > > Thanks, > Raymond. > > On Mon, Dec 28, 2020 at 11:53 PM Ilya Kasnacheev < > ilya.kasnach...@gmail.com> wrote: > >> Hello! >> >> 1. If we knew the specific circumstances in which a specific setting >> value will yield the most benefit, we would've already set it to that >> value. A setting means that you may tune it and get better results, or not. >> But in general we can't promise you anything. I did see improvements from >> increasing this setting in a very specific setup, but in general you may >> leave it as is. >> >> 2. More frequent checkpoints mean increased write amplification. So >> reducing this value may overwhelm your system with load that it was able to >> handle previously. You can set this setting to arbitrary small value, >> meaning that checkpoints will be purely sequential without any pauses >> between them. >> >> 3. I don't think that default throttling mechanism will emit any >> warnings. What do you see in thread dumps? >> >> Regards, >> -- >> Ilya Kasnacheev >> >> >> ср, 23 дек. 2020 г. в 12:48, Raymond Wilson <raymond_wil...@trimble.com>: >> >>> Hi, >>> >>> We have been investigating some issues which appear to be related to >>> checkpointing. We currently use the IA 2.8.1 with the C# client. >>> >>> I have been trying to gain clarity on how certain aspects of the Ignite >>> configuration relate to the checkpointing process: >>> >>> 1. Number of check pointing threads. This defaults to 4, but I don't >>> understand how it applies to the checkpointing process. Are more threads >>> generally better (eg: because it makes the disk IO parallel across the >>> threads), or does it only have a positive effect if you have many data >>> storage regions? Or something else? If this could be clarified in the >>> documentation (or a pointer to it which Google has not yet found), that >>> would be good. >>> >>> 2. Checkpoint frequency. This is defaulted to 180 seconds. I was >>> thinking that reducing this time would result in smaller >>> less disruptive check points. Setting it to 60 seconds seems pretty >>> safe, but is there a practical lower limit that should be used for use >>> cases with new data constantly being added, eg: 5 seconds, 10 seconds? >>> >>> 3. Write exclusivity constraints during checkpointing. I understand that >>> while a checkpoint is occurring ongoing writes will be supported into the >>> caches being check pointed, and if those are writes to existing pages then >>> those will be duplicated into the checkpoint buffer. If this buffer becomes >>> full or stressed then Ignite will throttle, and perhaps block, writes until >>> the checkpoint is complete. If this is the case then Ignite will emit >>> logging (warning or informational?) that writes are being throttled. >>> >>> We have cases where simple puts to caches (a few requests per second) >>> are taking up to 90 seconds to execute when there is an active check point >>> occurring, where the check point has been triggered by the checkpoint >>> timer. When a checkpoint is not occurring the time to do this is usually in >>> the milliseconds. The checkpoints themselves can take 90 seconds or longer, >>> and are updating up to 30,000-40,000 pages, across a pair of data storage >>> regions, one with 4Gb in-memory space allocated (which should be 1,000,000 >>> pages at the standard 4kb page size), and one small region with 128Mb. >>> There is no 'throttling' logging being emitted that we can tell, so the >>> checkpoint buffer (which should be 1Gb for the first data region and 256 Mb >>> for the second smaller region in this case) does not look like it can fill >>> up during the checkpoint. >>> >>> It seems like the checkpoint is affecting the put operations, but I >>> don't understand why that may be given the documented checkpointing >>> process, and the checkpoint itself (at least via Informational logging) is >>> not advertising any restrictions. >>> >>> Thanks, >>> Raymond. >>> >>> -- >>> <http://www.trimble.com/> >>> Raymond Wilson >>> Solution Architect, Civil Construction Software Systems (CCSS) >>> >>> > > -- > <http://www.trimble.com/> > Raymond Wilson > Solution Architect, Civil Construction Software Systems (CCSS) > 11 Birmingham Drive | Christchurch, New Zealand > +64-21-2013317 Mobile > raymond_wil...@trimble.com > > > <https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch> > -- <http://www.trimble.com/> Raymond Wilson Solution Architect, Civil Construction Software Systems (CCSS) 11 Birmingham Drive | Christchurch, New Zealand +64-21-2013317 Mobile raymond_wil...@trimble.com <https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch>