Hello, I am using Ignite with Cassandra, loading data from Cassandra on-demand using multiple query statements. But only a (seemingly random) subset of the rows/objects are loaded into ignite.
When I load using a single query, all rows/objects are loaded correctly. In another environment, the same data, config and code loads correctly with multiple queries. The main difference is that this environment uses older, slower computers. When I repeat the same load request multiple times, each repetition adds a few more rows/objects, until eventually all the rows/objects are loaded into the cache. The environment where it works (all matching rows are loaded into the cache) is a cluster of 3 old desktop machines with 2 cores each. The same 3 nodes also run Cassandra. The environment where only a part of the rows are loaded is a cluster of 3 modern serves (VMs) with 8 cores. The same 3 nodes also run Cassandra. The only theory I have at this time is that with more cores, more queries / inserts are executed in parallel and something goes wrong with that higher level of parallelism. I am calling loadCache( null, String[]);. The string array has 7 queries, one for each partition in Cassandra. I have verified the queries; they return the correct rows when executed in cqlsh. There are no error, warning or info logs during the load, neither from the client nor from the 3 servers. I turned on additional logging in the environment that has the problem. Because I am loading 15K rows, there are thousands of logs, and an analysis is difficult. However, the following logs happen to stand out: [19:07:45,946][FINE][pool-12-thread-4][GridQueryProcessor] Store [space=FcPortStatsCache, key=com.abc.poc.icpoc.model.FcPortStatsKey [idHash=1690890517, hash=-725839099, hour=Fri Jul 07 15:00:00 UTC 2017, bucket=3, dateTime=Fri Jul 07 15:09:00 UTC 2017, portId=007cb4ec-4dfd-47e5-8ae7-4781da08ac7c], val=com.abc.poc.icpoc.model.FcPortStats [idHash=117595137, hash=-1000695178, portId=007cb4ec-4dfd-47e5-8ae7-4781da08ac7c, dateTime=Fri Jul 07 15:09:00 UTC 2017, portType=1, switchId=339c5bcd-b503-4d54-948e-7ae6c40f31c2, rxUtil=78.479034, txUtil=91.488396, higherUtil=91.488396, lowerUtil=78.479034, rxRate=18411.0, txRate=15424.0, higherRate=18411.0, lowerRate=15424.0, c3Discards=0.0, crcErrors=0.0]] There are 15,414 of these logs. Only 5627 objects were loaded into the cache. 15,000 rows match the queries. [19:07:45,944][FINE][pool-12-thread-6][GridDhtAtomicCache] <FcPortStatsCache> Remove will not be done for key (entry got replaced or removed): com.abc.poc.icpoc.model.FcPortStatsKey [idHash=326667456, hash=627195288, hour=Fri Jul 07 15:00:00 UTC 2017, bucket=5, dateTime=Fri Jul 07 15:09:00 UTC 2017, portId=00159ca0-a6c9-47e5-a1f6-f3fe12941ba1] There are exactly the same number of these (15,414). [19:07:45,945][FINE][pool-12-thread-6][GridDhtAtomicCache] <FcPortStatsCache> Ignoring entry for partition that does not belong [key=com.abc.poc.icpoc.model.FcPortStatsKey [idHash=235959795, hash=1960141096, hour=Fri Jul 07 15:00:00 UTC 2017, bucket=5, dateTime=Fri Jul 07 15:09:00 UTC 2017, portId=001648ce-5410-472b-ac32-5b8056b30674], val=FcPortStats: 001648ce-5410-472b-ac32-5b8056b30674, Fri Jul 07 15:09:00 UTC 2017, 1; 53498d64-e53f-4136-a0db-7be9e200cf84 ..., err=class org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtInvalidPartitionException [part=509, msg=Creating partition which does not belong to local node (often may be caused by inconsistent 'key.hashCode()' implementation) [part=509, topVer=AffinityTopologyVersion [topVer=-1, minorTopVer=0], this.topVer=AffinityTopologyVersion [topVer=4, minorTopVer=0]]]] There are 5685 of these logs, way less than the number of missing objects. The message suggest that the hash code for the key may not be good enough. But would that not also apply in the other (slow, but good) environment, or when loading a single query? ver. 2.0.0#20170430-sha1:d4eef3c6 Any suggestions? Thanks... Roger