* Additionally to Ilya reply you can check vendors page for additional info, all in this page are applicable for ignite too [1]. Increasing threads number leads to concurrent io usage, thus if your have something like nvme — it`s up to you but in case of sas possibly better would be to reduce this param. * Log will shows you something like : Parking thread=%Thread name% for timeout(ms)= %time% and appropriate : Unparking thread= * No additional looging with cp buffer usage are provided. cp buffer need to be more than 10% of overall persistent DataRegions size. * 90 seconds or longer — Seems like problems in io or system tuning, it`s very bad score i hope. [1] https://www.gridgain.com/docs/latest/perf-troubleshooting-guide/persistence-tuning
>Hi, > >We have been investigating some issues which appear to be related to >checkpointing. We currently use the IA 2.8.1 with the C# client. > >I have been trying to gain clarity on how certain aspects of the Ignite >configuration relate to the checkpointing process: > >1. Number of check pointing threads. This defaults to 4, but I don't >understand how it applies to the checkpointing process. Are more threads >generally better (eg: because it makes the disk IO parallel across the >threads), or does it only have a positive effect if you have many data storage >regions? Or something else? If this could be clarified in the documentation >(or a pointer to it which Google has not yet found), that would be good. > >2. Checkpoint frequency. This is defaulted to 180 seconds. I was thinking that >reducing this time would result in smaller less disruptive check points. >Setting it to 60 seconds seems pretty safe, but is there a practical lower >limit that should be used for use cases with new data constantly being added, >eg: 5 seconds, 10 seconds? > >3. Write exclusivity constraints during checkpointing. I understand that while >a checkpoint is occurring ongoing writes will be supported into the caches >being check pointed, and if those are writes to existing pages then those will >be duplicated into the checkpoint buffer. If this buffer becomes full or >stressed then Ignite will throttle, and perhaps block, writes until the >checkpoint is complete. If this is the case then Ignite will emit logging >(warning or informational?) that writes are being throttled. > >We have cases where simple puts to caches (a few requests per second) are >taking up to 90 seconds to execute when there is an active check point >occurring, where the check point has been triggered by the checkpoint timer. >When a checkpoint is not occurring the time to do this is usually in the >milliseconds. The checkpoints themselves can take 90 seconds or longer, and >are updating up to 30,000-40,000 pages, across a pair of data storage regions, >one with 4Gb in-memory space allocated (which should be 1,000,000 pages at the >standard 4kb page size), and one small region with 128Mb. There is no >'throttling' logging being emitted that we can tell, so the checkpoint buffer >(which should be 1Gb for the first data region and 256 Mb for the second >smaller region in this case) does not look like it can fill up during the >checkpoint. > >It seems like the checkpoint is affecting the put operations, but I don't >understand why that may be given the documented checkpointing process, and the >checkpoint itself (at least via Informational logging) is not advertising any >restrictions. > >Thanks, >Raymond. > -- > >Raymond Wilson >Solution Architect, Civil Construction Software Systems (CCSS) >
