Hi community,
We have 4 servers, 4 tables, backups = 1, cacheMode =
PARTITIONED,partitionlosspolicy = READ_ONLY_SAFE,persistenceEnabled =
true,ignite version 2.7.6
When deleting the data of a table with a large amount of data, three
servers failed due to OutOfMemory.
After starting the failed 3 servers, it is found that the table data
cannot be queried, and the following errors are throw:
err=Failed to execute query because cache partition has been lost
At this time, execute the following command:
./control.sh --cache reset_lost_partitions tableName;
Then, the table data can be queried, and the total amount of data is
correct without data loss.
However, if you execute the cache -a command in the visor, the following
situations will occur:
We find that there is no primary partition data in one server, and no
backup partition data in one server, which leads to significant data
imbalance, and all partition tables have data imbalance, which is the
same distribution pattern as the above figure.
At this time, if the entire cluster is restarted, everything will return
to normal, the data distribution is as follows:
My question is:
1.Is there any way to see which partitions of which nodes are lost?
2.In the end, it seems that there is no real partition loss, but the
status is wrong. What is the reason for the partition loss?
3.What are the reasons for the data imbalance? Besides add / remove
nodes, is there any way that can trigger data rebalancing manually?