[ https://issues.apache.org/jira/browse/IGNITE-12950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ivan Mironovich reassigned IGNITE-12950: ---------------------------------------- Assignee: Ivan Mironovich > Partitions validator must check sizes even if update counters are different > --------------------------------------------------------------------------- > > Key: IGNITE-12950 > URL: https://issues.apache.org/jira/browse/IGNITE-12950 > Project: Ignite > Issue Type: Improvement > Components: cache > Reporter: Ivan Mironovich > Assignee: Ivan Mironovich > Priority: Major > Fix For: 2.9 > > Original Estimate: 336h > Remaining Estimate: 336h > > We have method in GridDhtPartitionsStateValidator: > {code:java} > // public void validatePartitionCountersAndSizes( > GridDhtPartitionsExchangeFuture fut, > GridDhtPartitionTopology top, > Map<UUID, GridDhtPartitionsSingleMessage> messages > ) throws IgniteCheckedException { > final Set<UUID> ignoringNodes = new HashSet<>(); > // Ignore just joined nodes. > for (DiscoveryEvent evt : fut.events().events()) { > if (evt.type() == EVT_NODE_JOINED) > ignoringNodes.add(evt.eventNode().id()); > } > AffinityTopologyVersion topVer = > fut.context().events().topologyVersion(); > // Validate update counters. > Map<Integer, Map<UUID, Long>> result = > validatePartitionsUpdateCounters(top, messages, ignoringNodes); > if (!result.isEmpty()) > throw new IgniteCheckedException("Partitions update counters are > inconsistent for " + fold(topVer, result)); > // For sizes validation ignore also nodes which are not able to send > cache sizes. > for (UUID id : messages.keySet()) { > ClusterNode node = cctx.discovery().node(id); > if (node != null && > node.version().compareTo(SIZES_VALIDATION_AVAILABLE_SINCE) < 0) > ignoringNodes.add(id); > } > if (!cctx.cache().cacheGroup(top.groupId()).mvccEnabled()) { // TODO: > Remove "if" clause in IGNITE-9451. > // Validate cache sizes. > result = validatePartitionsSizes(top, messages, ignoringNodes); > if (!result.isEmpty()) > throw new IgniteCheckedException("Partitions cache sizes are > inconsistent for " + fold(topVer, result)); > } > } > {code} > We should check partitions sizes even if update counters are different. It > could be helpful for debugging problems on production. > We must print information about all copies, if a partition is in an > inconsistent state. Now we could get the message on cache group with 3 > backups: > {code:java} > // Partition states validation has failed for group: CACHEGROUP. Partitions > update counters are inconsistent for Part 3415: [10.104.6.10:47500=2577263 > 10.104.6.12:47500=2577263 10.104.6.23:47500=2577262 10.104.6.9:47500=2577263 > ] Part 4960: [10.104.6.11:47500=2560994 10.104.6.23:47500=2560993 ] > {code} > (part 4960 contains information about 2 copies only) -- This message was sent by Atlassian Jira (v8.3.4#803005)