Hi community,

We have 4 servers, 4 tables, backups = 1, cacheMode = PARTITIONED,partitionlosspolicy = READ_ONLY_SAFE,persistenceEnabled = true,ignite version 2.7.6

When deleting the data of a table with a large amount of data, three servers failed due to OutOfMemory.

After starting the failed 3 servers, it is found that the table data cannot be queried, and the following errors are throw:

 err=Failed to execute query because cache partition has been lost

At this time, execute the following command:

./control.sh --cache reset_lost_partitions tableName;

Then, the table data can be queried, and the total amount of data is correct without data loss.

However, if you execute the cache -a command in the visor, the following situations will occur:

We find that there is no primary partition data in one server, and no backup partition data in one server, which leads to significant data imbalance, and all partition tables have data imbalance, which is the same distribution pattern as the above figure.

At this time, if the entire cluster is restarted, everything will return to normal, the data distribution is as follows:

My question is:

1.Is there any way to see which partitions of which nodes are lost?

2.In the end, it seems that there is no real partition loss, but the status is wrong. What is the reason for the partition loss?

3.What are the reasons for the data imbalance? Besides add / remove nodes, is there any way that can trigger data rebalancing manually?


Reply via email to