Partition loss and data balancing

18624049226 Sun, 26 Apr 2020 02:53:43 -0700

Hi community,

We have 4 servers, 4 tables, backups = 1, cacheMode =PARTITIONED,partitionlosspolicy = READ_ONLY_SAFE,persistenceEnabled =true,ignite version 2.7.6

When deleting the data of a table with a large amount of data, threeservers failed due to OutOfMemory.

After starting the failed 3 servers, it is found that the table datacannot be queried, and the following errors are throw:


 err=Failed to execute query because cache partition has been lost

At this time, execute the following command:

./control.sh --cache reset_lost_partitions tableName;

Then, the table data can be queried, and the total amount of data iscorrect without data loss.

However, if you execute the cache -a command in the visor, the followingsituations will occur:

We find that there is no primary partition data in one server, and nobackup partition data in one server, which leads to significant dataimbalance, and all partition tables have data imbalance, which is thesame distribution pattern as the above figure.

At this time, if the entire cluster is restarted, everything will returnto normal, the data distribution is as follows:


My question is:

1.Is there any way to see which partitions of which nodes are lost?

2.In the end, it seems that there is no real partition loss, but thestatus is wrong. What is the reason for the partition loss?

3.What are the reasons for the data imbalance? Besides add / removenodes, is there any way that can trigger data rebalancing manually?

Partition loss and data balancing

Reply via email to