Hello,

I am using Ignite with Cassandra, loading data from Cassandra on-demand using 
multiple query statements.  But only a (seemingly random) subset of the 
rows/objects are loaded into ignite.

When I load using a single query, all rows/objects are loaded correctly.

In another environment, the same data, config and code loads correctly with 
multiple queries. The main difference is that this environment uses older, 
slower computers.

When I repeat the same load request multiple times, each repetition adds a few 
more rows/objects, until eventually all the rows/objects are loaded into the 
cache.

The environment where it works (all matching rows are loaded into the cache) is 
a cluster of 3 old desktop machines with 2 cores each. The same 3 nodes also 
run Cassandra.

The environment where only a part of the rows are loaded is a cluster of 3 
modern serves (VMs) with 8 cores. The same 3 nodes also run Cassandra.

The only theory I have at this time is that with more cores, more queries / 
inserts are executed in parallel and something goes wrong with that higher 
level of parallelism.

I am calling loadCache( null, String[]);. The string array has 7 queries, one 
for each partition in Cassandra.

I have verified the queries; they return the correct rows when executed in 
cqlsh.

There are no error, warning or info logs during the load, neither from the 
client nor from the 3 servers.

I turned on additional logging in the environment that has the problem. Because 
I am loading 15K rows, there are thousands of logs, and an analysis is 
difficult. However, the following logs happen to stand out:


[19:07:45,946][FINE][pool-12-thread-4][GridQueryProcessor] Store 
[space=FcPortStatsCache, key=com.abc.poc.icpoc.model.FcPortStatsKey 
[idHash=1690890517, hash=-725839099, hour=Fri Jul 07 15:00:00 UTC 2017, 
bucket=3, dateTime=Fri Jul 07 15:09:00 UTC 2017, 
portId=007cb4ec-4dfd-47e5-8ae7-4781da08ac7c], 
val=com.abc.poc.icpoc.model.FcPortStats [idHash=117595137, hash=-1000695178, 
portId=007cb4ec-4dfd-47e5-8ae7-4781da08ac7c, dateTime=Fri Jul 07 15:09:00 UTC 
2017, portType=1, switchId=339c5bcd-b503-4d54-948e-7ae6c40f31c2, 
rxUtil=78.479034, txUtil=91.488396, higherUtil=91.488396, lowerUtil=78.479034, 
rxRate=18411.0, txRate=15424.0, higherRate=18411.0, lowerRate=15424.0, 
c3Discards=0.0, crcErrors=0.0]]

There are 15,414 of these logs. Only 5627 objects were loaded into the cache. 
15,000 rows match the queries.


[19:07:45,944][FINE][pool-12-thread-6][GridDhtAtomicCache] <FcPortStatsCache> 
Remove will not be done for key (entry got replaced or removed): 
com.abc.poc.icpoc.model.FcPortStatsKey [idHash=326667456, hash=627195288, 
hour=Fri Jul 07 15:00:00 UTC 2017, bucket=5, dateTime=Fri Jul 07 15:09:00 UTC 
2017, portId=00159ca0-a6c9-47e5-a1f6-f3fe12941ba1]

There are exactly the same number of these (15,414).


[19:07:45,945][FINE][pool-12-thread-6][GridDhtAtomicCache] <FcPortStatsCache> 
Ignoring entry for partition that does not belong 
[key=com.abc.poc.icpoc.model.FcPortStatsKey [idHash=235959795, hash=1960141096, 
hour=Fri Jul 07 15:00:00 UTC 2017, bucket=5, dateTime=Fri Jul 07 15:09:00 UTC 
2017, portId=001648ce-5410-472b-ac32-5b8056b30674], val=FcPortStats: 
001648ce-5410-472b-ac32-5b8056b30674, Fri Jul 07 15:09:00 UTC 2017, 1; 
53498d64-e53f-4136-a0db-7be9e200cf84 ..., err=class 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtInvalidPartitionException
 [part=509, msg=Creating partition which does not belong to local node (often 
may be caused by inconsistent 'key.hashCode()' implementation) [part=509, 
topVer=AffinityTopologyVersion [topVer=-1, minorTopVer=0], 
this.topVer=AffinityTopologyVersion [topVer=4, minorTopVer=0]]]]

There are 5685 of these logs, way less than the number of missing objects.
The message suggest that the hash code for the key may not be good enough. But 
would that not also apply in the other (slow, but good) environment, or when 
loading a single query?


ver. 2.0.0#20170430-sha1:d4eef3c6


Any suggestions?

Thanks...

Roger

Reply via email to