Hey Igniters, I've the following setup (Ignite .NET 2.4): 2 Server nodes, 1 client node doing SQL-queries on the cache periodically (every 20 Seconds in my case). The cache is filled with 110_000 entries from a database, using "LoadCache" method. Key is a string representation of a number, nothing fancy here.
Situation: Both server nodes are put under pressure by doing affinity-run compute jobs on both nodes, affecting all cache entries (read, change, put every entry). I made the following observations: 1. Visorcmd showed that the entries were distributed like 60_000 on one node and 34_000 on the other. The same sum (94_000) was shown on the client side on every periodic "tick" when calling "GetSize" on the cache instance (https://github.com/apache/ignite/blob/master/modules/platforms/dotnet/Apache.Ignite.Core/Cache/ICache.cs#L685). * Why are there entries missing? Running SELECT Count(*) on the Cache with SQLLine reports back 110_000 entries. * Why are the entries not distributed 50/50 (or nearly 50/50)? 1. On the client, the SQL query invoked on every "tick" returned sometimes 110_000 entries, sometimes 60_000 or 34_000. There was no error or warning in the client or server log about failing SQL queries. * In a partitioned cache both servers do a query and the results are merged, if I understood correctly. It seems to me that one of the servers sometimes returns an empty result set and therefore the client gets a too small result set. Question is: why does this happen even without a warning on the server nodes about a failing query? 2. In that situation the client is not able to load a specific entry from the cache multiple times using TryGet(TK key, out TV value) (https://github.com/apache/ignite/blob/master/modules/platforms/dotnet/Apache.Ignite.Core/Cache/ICache.cs#L297). Those entries definitely are existing in the cache. 3. In that situation on one of both server nodes I get errors that an entry could not be loaded (like in 3) but on the affinity-server node!). In my understanding the compute jobs shall get executed on the primary node for the given key. And this node is not able to load an entry by that key (when under heavy CPU pressure)? Something is strange here. Any ideas? Cheers, Dome