Hello.

I don’t know the cause of your issue.
But, we have feature to overcome it [1]

Consistency repair can be run from control.sh.

```
./bin/control.sh --enable-experimental
...
  [EXPERIMENTAL]
  Check/Repair cache consistency using Read Repair approach:
    control.(sh|bat) --consistency repair cache-name partition

    Parameters:
      cache-name  - Cache to be checked/repaired.
      partition   - Cache's partition to be checked/repaired.

  [EXPERIMENTAL]
  Cache consistency check/repair operations status:
    control.(sh|bat) --consistency status

  [EXPERIMENTAL]
  Finalize partitions update counters:
    control.(sh|bat) --consistency finalize
```

It seems that docs for a cmd command not full.
It also accepts strategy argument so you can manage your repair actions more 
accurate.
Try to run:

```
❯ ./bin/control.sh --enable-experimental --consistency repair --cache default 
--strategy CHECK_ONLY --partitions 1,2,3,…your_partitions_list...
```

Available strategies with good description can be found in sources [2]


[1] https://ignite.apache.org/docs/latest/key-value-api/read-repair
[2] 
https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/cache/ReadRepairStrategy.java



> 21 авг. 2023 г., в 07:46, Raymond Wilson <[email protected]> 
> написал(а):
> 
> [Replying onto correct thread]
> 
> As a follow up to this email, we are starting to collect evidence that 
> replicated caches within our Ignite grid are failing to replicate values in a 
> small number of cases. 
> 
> In the cases we observe so far, with a cluster of 4 nodes participating in a 
> replicated cache, only one node reports having the correct value for a key, 
> and the other three report having no value for that key.
> 
> The documentation is pretty opinionated about the 
> CacheWriteSynchronizationMode not being impactful with respect to consistency 
> for replicated caches. As noted below, we use PrimarySync (the default) for 
> these caches, which would suggest a potential failure mode preventing the 
> backup copies obtaining their copy once the primary copy has been written.
>   
> We are continuing to investigate and would be interested in any suggestions 
> you may have as to the likely cause.
> 
> Thanks,
> Raymond.
> 
> On Thu, Jul 27, 2023 at 12:38 PM Raymond Wilson <[email protected] 
> <mailto:[email protected]>> wrote:
>> Hi,
>> 
>> I have a query regarding data safety of replicated caches in the case of 
>> hard failure of the compute resource but where the storage resource is 
>> available when the node returns.
>> 
>> We are using Ignite 2.15 with the C# client.
>> 
>> We have a number of these caches that have four nodes participating in the 
>> replicated caches, all with the default PrimarySync write synchronization 
>> mode. All data storage configurations are configured with WalMode = 
>> WalMode.Fsync.
>> 
>> We have logic performing writes against these caches which will continue 
>> once the primary node for the replicated cache has written the data item.
>> 
>> I am unsure of the guarantees made by Ignite at this point in the event of 
>> failure. Specifically, hard/red-button failure of compute hardware resources 
>> and/or abrupt (but recoverable) detachment of storage resources.
>> 
>> Scenario one: Primary node returns "OK", then immediately fails (before 
>> check point). When the primary node returns should I expect the replicated 
>> value to be in the primary, and to appear in all other nodes too.
>> 
>> Scenario two: Primary node returns "OK", then a secondary node immediately 
>> fails (before achieving the write and so before any check point). When the 
>> secondary node returns should I expect the replicated value to be in the 
>> recovered secondary node?
>> 
>> In relation to these scenarios, does setting the cache write synchronization 
>> mode improve the safety of the write as all nodes must acknowledge the write 
>> before it returns.
>> 
>> If there is an improvement in write safety in this instance, does this imply 
>> the Fsync WalMode write pathway has opportunities for data loss in these 
>> failure situations?
>> 
>> Thanks,
>> Raymond.
>> 
>> 
>> 
>> 
>> -- 
>>  <http://www.trimble.com/>
>> Raymond Wilson
>> Trimble Distinguished Engineer, Civil Construction Software (CCS)
>> 11 Birmingham Drive | Christchurch, New Zealand
>> [email protected] <mailto:[email protected]>
>>  
>> <https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch>
> 
> -- 
>  <http://www.trimble.com/>
> Raymond Wilson
> Trimble Distinguished Engineer, Civil Construction Software (CCS)
> 11 Birmingham Drive | Christchurch, New Zealand
> [email protected] <mailto:[email protected]>
>  
> <https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch>

Reply via email to