Hello.
I don’t know the cause of your issue.
But, we have feature to overcome it [1]
Consistency repair can be run from control.sh.
```
./bin/control.sh --enable-experimental
...
[EXPERIMENTAL]
Check/Repair cache consistency using Read Repair approach:
control.(sh|bat) --consistency repair cache-name partition
Parameters:
cache-name - Cache to be checked/repaired.
partition - Cache's partition to be checked/repaired.
[EXPERIMENTAL]
Cache consistency check/repair operations status:
control.(sh|bat) --consistency status
[EXPERIMENTAL]
Finalize partitions update counters:
control.(sh|bat) --consistency finalize
```
It seems that docs for a cmd command not full.
It also accepts strategy argument so you can manage your repair actions more
accurate.
Try to run:
```
❯ ./bin/control.sh --enable-experimental --consistency repair --cache default
--strategy CHECK_ONLY --partitions 1,2,3,…your_partitions_list...
```
Available strategies with good description can be found in sources [2]
[1] https://ignite.apache.org/docs/latest/key-value-api/read-repair
[2]
https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/cache/ReadRepairStrategy.java
> 21 авг. 2023 г., в 07:46, Raymond Wilson <[email protected]>
> написал(а):
>
> [Replying onto correct thread]
>
> As a follow up to this email, we are starting to collect evidence that
> replicated caches within our Ignite grid are failing to replicate values in a
> small number of cases.
>
> In the cases we observe so far, with a cluster of 4 nodes participating in a
> replicated cache, only one node reports having the correct value for a key,
> and the other three report having no value for that key.
>
> The documentation is pretty opinionated about the
> CacheWriteSynchronizationMode not being impactful with respect to consistency
> for replicated caches. As noted below, we use PrimarySync (the default) for
> these caches, which would suggest a potential failure mode preventing the
> backup copies obtaining their copy once the primary copy has been written.
>
> We are continuing to investigate and would be interested in any suggestions
> you may have as to the likely cause.
>
> Thanks,
> Raymond.
>
> On Thu, Jul 27, 2023 at 12:38 PM Raymond Wilson <[email protected]
> <mailto:[email protected]>> wrote:
>> Hi,
>>
>> I have a query regarding data safety of replicated caches in the case of
>> hard failure of the compute resource but where the storage resource is
>> available when the node returns.
>>
>> We are using Ignite 2.15 with the C# client.
>>
>> We have a number of these caches that have four nodes participating in the
>> replicated caches, all with the default PrimarySync write synchronization
>> mode. All data storage configurations are configured with WalMode =
>> WalMode.Fsync.
>>
>> We have logic performing writes against these caches which will continue
>> once the primary node for the replicated cache has written the data item.
>>
>> I am unsure of the guarantees made by Ignite at this point in the event of
>> failure. Specifically, hard/red-button failure of compute hardware resources
>> and/or abrupt (but recoverable) detachment of storage resources.
>>
>> Scenario one: Primary node returns "OK", then immediately fails (before
>> check point). When the primary node returns should I expect the replicated
>> value to be in the primary, and to appear in all other nodes too.
>>
>> Scenario two: Primary node returns "OK", then a secondary node immediately
>> fails (before achieving the write and so before any check point). When the
>> secondary node returns should I expect the replicated value to be in the
>> recovered secondary node?
>>
>> In relation to these scenarios, does setting the cache write synchronization
>> mode improve the safety of the write as all nodes must acknowledge the write
>> before it returns.
>>
>> If there is an improvement in write safety in this instance, does this imply
>> the Fsync WalMode write pathway has opportunities for data loss in these
>> failure situations?
>>
>> Thanks,
>> Raymond.
>>
>>
>>
>>
>> --
>> <http://www.trimble.com/>
>> Raymond Wilson
>> Trimble Distinguished Engineer, Civil Construction Software (CCS)
>> 11 Birmingham Drive | Christchurch, New Zealand
>> [email protected] <mailto:[email protected]>
>>
>> <https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch>
>
> --
> <http://www.trimble.com/>
> Raymond Wilson
> Trimble Distinguished Engineer, Civil Construction Software (CCS)
> 11 Birmingham Drive | Christchurch, New Zealand
> [email protected] <mailto:[email protected]>
>
> <https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch>