Hello, Raymond.

Usually, experimental is feature that can be changed in future.
This statement relates to the public API of the feature usually.

> Does this imply risk if run against a production environment grid?

It depends.
As for read repair, CHECK_ONLY is read only mode and can’t harm your data.
Other modes that fix data inconsistency was used on our production and there 
are no known issues.


> 22 авг. 2023 г., в 03:12, Raymond Wilson <[email protected]> 
> написал(а):
> 
> Thanks for the pointer to the read repair facility added in Ignite 2.14.
> 
> Unfortunately the .WithReadRepair() extension does not seem to be present in 
> the Ignite C# client.
> 
> This means we either need to use the experimental Command.sh support, or 
> improve our tooling to effectively do the same. I am curious why this is 
> labelled as experimental? Does this imply risk if run against a production 
> environment grid?
> 
> Raymond.
> 
> 
> On Mon, Aug 21, 2023 at 5:50 PM Николай Ижиков <[email protected] 
> <mailto:[email protected]>> wrote:
>> Hello.
>> 
>> I don’t know the cause of your issue.
>> But, we have feature to overcome it [1]
>> 
>> Consistency repair can be run from control.sh.
>> 
>> ```
>> ./bin/control.sh --enable-experimental
>> ...
>>   [EXPERIMENTAL]
>>   Check/Repair cache consistency using Read Repair approach:
>>     control.(sh|bat) --consistency repair cache-name partition
>> 
>>     Parameters:
>>       cache-name  - Cache to be checked/repaired.
>>       partition   - Cache's partition to be checked/repaired.
>> 
>>   [EXPERIMENTAL]
>>   Cache consistency check/repair operations status:
>>     control.(sh|bat) --consistency status
>> 
>>   [EXPERIMENTAL]
>>   Finalize partitions update counters:
>>     control.(sh|bat) --consistency finalize
>> ```
>> 
>> It seems that docs for a cmd command not full.
>> It also accepts strategy argument so you can manage your repair actions more 
>> accurate.
>> Try to run:
>> 
>> ```
>> ❯ ./bin/control.sh --enable-experimental --consistency repair --cache 
>> default --strategy CHECK_ONLY --partitions 1,2,3,…your_partitions_list...
>> ```
>> 
>> Available strategies with good description can be found in sources [2]
>> 
>> 
>> [1] https://ignite.apache.org/docs/latest/key-value-api/read-repair
>> [2] 
>> https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/cache/ReadRepairStrategy.java
>> 
>> 
>> 
>>> 21 авг. 2023 г., в 07:46, Raymond Wilson <[email protected] 
>>> <mailto:[email protected]>> написал(а):
>>> 
>>> [Replying onto correct thread]
>>> 
>>> As a follow up to this email, we are starting to collect evidence that 
>>> replicated caches within our Ignite grid are failing to replicate values in 
>>> a small number of cases. 
>>> 
>>> In the cases we observe so far, with a cluster of 4 nodes participating in 
>>> a replicated cache, only one node reports having the correct value for a 
>>> key, and the other three report having no value for that key.
>>> 
>>> The documentation is pretty opinionated about the 
>>> CacheWriteSynchronizationMode not being impactful with respect to 
>>> consistency for replicated caches. As noted below, we use PrimarySync (the 
>>> default) for these caches, which would suggest a potential failure mode 
>>> preventing the backup copies obtaining their copy once the primary copy has 
>>> been written.
>>>   
>>> We are continuing to investigate and would be interested in any suggestions 
>>> you may have as to the likely cause.
>>> 
>>> Thanks,
>>> Raymond.
>>> 
>>> On Thu, Jul 27, 2023 at 12:38 PM Raymond Wilson <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>>> Hi,
>>>> 
>>>> I have a query regarding data safety of replicated caches in the case of 
>>>> hard failure of the compute resource but where the storage resource is 
>>>> available when the node returns.
>>>> 
>>>> We are using Ignite 2.15 with the C# client.
>>>> 
>>>> We have a number of these caches that have four nodes participating in the 
>>>> replicated caches, all with the default PrimarySync write synchronization 
>>>> mode. All data storage configurations are configured with WalMode = 
>>>> WalMode.Fsync.
>>>> 
>>>> We have logic performing writes against these caches which will continue 
>>>> once the primary node for the replicated cache has written the data item.
>>>> 
>>>> I am unsure of the guarantees made by Ignite at this point in the event of 
>>>> failure. Specifically, hard/red-button failure of compute hardware 
>>>> resources and/or abrupt (but recoverable) detachment of storage resources.
>>>> 
>>>> Scenario one: Primary node returns "OK", then immediately fails (before 
>>>> check point). When the primary node returns should I expect the replicated 
>>>> value to be in the primary, and to appear in all other nodes too.
>>>> 
>>>> Scenario two: Primary node returns "OK", then a secondary node immediately 
>>>> fails (before achieving the write and so before any check point). When the 
>>>> secondary node returns should I expect the replicated value to be in the 
>>>> recovered secondary node?
>>>> 
>>>> In relation to these scenarios, does setting the cache write 
>>>> synchronization mode improve the safety of the write as all nodes must 
>>>> acknowledge the write before it returns.
>>>> 
>>>> If there is an improvement in write safety in this instance, does this 
>>>> imply the Fsync WalMode write pathway has opportunities for data loss in 
>>>> these failure situations?
>>>> 
>>>> Thanks,
>>>> Raymond.
>>>> 
>>>> 
>>>> 
>>>> 
>>>> -- 
>>>>  <http://www.trimble.com/>
>>>> Raymond Wilson
>>>> Trimble Distinguished Engineer, Civil Construction Software (CCS)
>>>> 11 Birmingham Drive | Christchurch, New Zealand
>>>> [email protected] <mailto:[email protected]>
>>>>  
>>>> <https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch>
>>> 
>>> -- 
>>>  <http://www.trimble.com/>
>>> Raymond Wilson
>>> Trimble Distinguished Engineer, Civil Construction Software (CCS)
>>> 11 Birmingham Drive | Christchurch, New Zealand
>>> [email protected] <mailto:[email protected]>
>>>  
>>> <https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch>
> 
> 
> -- 
>  <http://www.trimble.com/>
> Raymond Wilson
> Trimble Distinguished Engineer, Civil Construction Software (CCS)
> 11 Birmingham Drive | Christchurch, New Zealand
> [email protected] <mailto:[email protected]>
>  
> <https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch>

Reply via email to