Re: Questions related to check pointing

Zhenya Stanilovsky Mon, 28 Dec 2020 22:31:28 -0800

*  Additionally to Ilya reply you can check vendors page for additional info, 
all in this page are applicable for ignite too [1]. Increasing threads number 
leads to concurrent io usage, thus if your have something like nvme — it`s up 
to you but in case of sas possibly better would be to reduce this param.
*  Log will shows you something like :
Parking thread=%Thread name% for timeout(ms)= %time% and appropriate :
Unparking thread=
*  No additional looging with cp buffer usage are provided. cp buffer need to 
be more than 10% of overall persistent  DataRegions size.
*  90 seconds or longer  —    Seems like problems in io or system tuning, it`s 
very bad score i hope. 
[1] 
https://www.gridgain.com/docs/latest/perf-troubleshooting-guide/persistence-tuning



 
>Hi,
> 
>We have been investigating some issues which appear to be related to 
>checkpointing. We currently use the IA 2.8.1 with the C# client.
> 
>I have been trying to gain clarity on how certain aspects of the Ignite 
>configuration relate to the checkpointing process:
> 
>1. Number of check pointing threads. This defaults to 4, but I don't 
>understand how it applies to the checkpointing process. Are more threads 
>generally better (eg: because it makes the disk IO parallel across the 
>threads), or does it only have a positive effect if you have many data storage 
>regions? Or something else? If this could be clarified in the documentation 
>(or a pointer to it which Google has not yet found), that would be good.
> 
>2. Checkpoint frequency. This is defaulted to 180 seconds. I was thinking that 
>reducing this time would result in smaller less disruptive check points. 
>Setting it to 60 seconds seems pretty safe, but is there a practical lower 
>limit that should be used for use cases with new data constantly being added, 
>eg: 5 seconds, 10 seconds?
> 
>3. Write exclusivity constraints during checkpointing. I understand that while 
>a checkpoint is occurring ongoing writes will be supported into the caches 
>being check pointed, and if those are writes to existing pages then those will 
>be duplicated into the checkpoint buffer. If this buffer becomes full or 
>stressed then Ignite will throttle, and perhaps block, writes until the 
>checkpoint is complete. If this is the case then Ignite will emit logging 
>(warning or informational?) that writes are being throttled.
> 
>We have cases where simple puts to caches (a few requests per second) are 
>taking up to 90 seconds to execute when there is an active check point 
>occurring, where the check point has been triggered by the checkpoint timer. 
>When a checkpoint is not occurring the time to do this is usually in the 
>milliseconds. The checkpoints themselves can take 90 seconds or longer, and 
>are updating up to 30,000-40,000 pages, across a pair of data storage regions, 
>one with 4Gb in-memory space allocated (which should be 1,000,000 pages at the 
>standard 4kb page size), and one small region with 128Mb. There is no 
>'throttling' logging being emitted that we can tell, so the checkpoint buffer 
>(which should be 1Gb for the first data region and 256 Mb for the second 
>smaller region in this case) does not look like it can fill up during the 
>checkpoint.
> 
>It seems like the checkpoint is affecting the put operations, but I don't 
>understand why that may be given the documented checkpointing process, and the 
>checkpoint itself (at least via Informational logging) is not advertising any 
>restrictions.
> 
>Thanks,
>Raymond.
>  --
>
>Raymond Wilson
>Solution Architect, Civil Construction Software Systems (CCSS)
>

Re: Questions related to check pointing

Reply via email to