This sounds strange. There definetely should be a cause of such behaviour. Rebalancing is happened only after an topology change (node join/leave, deactivation/activation). Could you please share logs from node with exception you mentioned in message, node with id "5423e6b5-c9be-4eb8-8f68-e643357ec2b3", and coordinator (oldest) node (you can find this node grepping "crd=true" in logs) to find the root cause of such behaviour? Cache configurations / Data storage configurations would be also very useful to debug.
1) If rebalancing didn't happen you should notice MOVING partitions in your cache groups (from metrics MxBeans or Visor). It's possible to write data to such partitions and read (it depends on configured PartitionLossPolicy in your caches). If you have at least 1 owner (OWNING state) for each of such replicated partition there is no data loss. Such MOVING partitions will be properly rebalanced after node restart and data become consistent in primary-backups partitions. 2) If part*.bin files are corrupted you may notice it only during node restart or subsequent cluster deactivation/activation or if you have less RAM than your data size and node do pages swapping (replacing) to/from disk. In usual cluster life this is undetectable since all data placed in RAM. ср, 26 дек. 2018 г. в 13:44, aMark <feku.fa...@gmail.com>: > Thanks Pavel for prompt response. > > I could confirm that node "5423e6b5-c9be-4eb8-8f68-e643357ec2b3" (and no > other node in the cluster) did not go down, not sure how did stale data > cropped up on few nodes. And this type of exception is coming from every > server node in the cluster. > > What happens if re-balancing did not happen properly due to this exception, > could it lead to data loss ? > does data get corrupted on the part*.bin files (in persistent store) in the > Ignite cache due to this exception ? > > Thanks, > > > > > > > -- > Sent from: http://apache-ignite-users.70518.x6.nabble.com/ >