Re: Questions related to check pointing

Raymond Wilson Mon, 28 Dec 2020 21:44:00 -0800

As another detail, we have the WriteThrottlingEnabled property left at its
default value of 'false', so I would not ordinarily expect throttling,
correct?


On Tue, Dec 29, 2020 at 10:04 AM Raymond Wilson <raymond_wil...@trimble.com>
wrote:

> Hi Ilya,
>
> Regarding the throttling question, I have not yet looked at thread dumps -
> the observed behaviour has been seen in production metrics and logging.
> What would you expect a thread dump to show in this case?
>
> Given my description of the sizes of the data regions and the numbers of
> pages being updated in a checkpoint would you expect any
> throttling behaviour?
>
> Thanks,
> Raymond.
>
> On Mon, Dec 28, 2020 at 11:53 PM Ilya Kasnacheev <
> ilya.kasnach...@gmail.com> wrote:
>
>> Hello!
>>
>> 1. If we knew the specific circumstances in which a specific setting
>> value will yield the most benefit, we would've already set it to that
>> value. A setting means that you may tune it and get better results, or not.
>> But in general we can't promise you anything. I did see improvements from
>> increasing this setting in a very specific setup, but in general you may
>> leave it as is.
>>
>> 2. More frequent checkpoints mean increased write amplification. So
>> reducing this value may overwhelm your system with load that it was able to
>> handle previously. You can set this setting to arbitrary small value,
>> meaning that checkpoints will be purely sequential without any pauses
>> between them.
>>
>> 3. I don't think that default throttling mechanism will emit any
>> warnings. What do you see in thread dumps?
>>
>> Regards,
>> --
>> Ilya Kasnacheev
>>
>>
>> ср, 23 дек. 2020 г. в 12:48, Raymond Wilson <raymond_wil...@trimble.com>:
>>
>>> Hi,
>>>
>>> We have been investigating some issues which appear to be related to
>>> checkpointing. We currently use the IA 2.8.1 with the C# client.
>>>
>>> I have been trying to gain clarity on how certain aspects of the Ignite
>>> configuration relate to the checkpointing process:
>>>
>>> 1. Number of check pointing threads. This defaults to 4, but I don't
>>> understand how it applies to the checkpointing process. Are more threads
>>> generally better (eg: because it makes the disk IO parallel across the
>>> threads), or does it only have a positive effect if you have many data
>>> storage regions? Or something else? If this could be clarified in the
>>> documentation (or a pointer to it which Google has not yet found), that
>>> would be good.
>>>
>>> 2. Checkpoint frequency. This is defaulted to 180 seconds. I was
>>> thinking that reducing this time would result in smaller
>>> less disruptive check points. Setting it to 60 seconds seems pretty
>>> safe, but is there a practical lower limit that should be used for use
>>> cases with new data constantly being added, eg: 5 seconds, 10 seconds?
>>>
>>> 3. Write exclusivity constraints during checkpointing. I understand that
>>> while a checkpoint is occurring ongoing writes will be supported into the
>>> caches being check pointed, and if those are writes to existing pages then
>>> those will be duplicated into the checkpoint buffer. If this buffer becomes
>>> full or stressed then Ignite will throttle, and perhaps block, writes until
>>> the checkpoint is complete. If this is the case then Ignite will emit
>>> logging (warning or informational?) that writes are being throttled.
>>>
>>> We have cases where simple puts to caches (a few requests per second)
>>> are taking up to 90 seconds to execute when there is an active check point
>>> occurring, where the check point has been triggered by the checkpoint
>>> timer. When a checkpoint is not occurring the time to do this is usually in
>>> the milliseconds. The checkpoints themselves can take 90 seconds or longer,
>>> and are updating up to 30,000-40,000 pages, across a pair of data storage
>>> regions, one with 4Gb in-memory space allocated (which should be 1,000,000
>>> pages at the standard 4kb page size), and one small region with 128Mb.
>>> There is no 'throttling' logging being emitted that we can tell, so the
>>> checkpoint buffer (which should be 1Gb for the first data region and 256 Mb
>>> for the second smaller region in this case) does not look like it can fill
>>> up during the checkpoint.
>>>
>>> It seems like the checkpoint is affecting the put operations, but I
>>> don't understand why that may be given the documented checkpointing
>>> process, and the checkpoint itself (at least via Informational logging) is
>>> not advertising any restrictions.
>>>
>>> Thanks,
>>> Raymond.
>>>
>>> --
>>> <http://www.trimble.com/>
>>> Raymond Wilson
>>> Solution Architect, Civil Construction Software Systems (CCSS)
>>>
>>>
>
> --
> <http://www.trimble.com/>
> Raymond Wilson
> Solution Architect, Civil Construction Software Systems (CCSS)
> 11 Birmingham Drive | Christchurch, New Zealand
> +64-21-2013317 Mobile
> raymond_wil...@trimble.com
>
>
> <https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch>
>


-- 
<http://www.trimble.com/>
Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
+64-21-2013317 Mobile
raymond_wil...@trimble.com

<https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch>

Re: Questions related to check pointing

Reply via email to