Hey Igniters,

I've the following setup (Ignite .NET 2.4): 2 Server nodes, 1 client node doing 
SQL-queries on the cache periodically (every 20 Seconds in my case).
The cache is filled with 110_000 entries from a database, using "LoadCache" 
method. Key is a string representation of a number, nothing fancy here.

Situation: Both server nodes are put under pressure by doing affinity-run 
compute jobs on both nodes, affecting all cache entries (read, change, put 
every entry).

I made the following observations:

  1.  Visorcmd showed that the entries were distributed like 60_000 on one node 
and 34_000 on the other. The same sum (94_000) was shown on the client side on 
every periodic "tick" when calling "GetSize" on the cache instance 
(https://github.com/apache/ignite/blob/master/modules/platforms/dotnet/Apache.Ignite.Core/Cache/ICache.cs#L685).


     *   Why are there entries missing? Running SELECT Count(*) on the Cache 
with SQLLine reports back 110_000 entries.
     *   Why are the entries not distributed 50/50 (or nearly 50/50)?


  1.  On the client, the SQL query invoked on every "tick" returned sometimes 
110_000 entries, sometimes 60_000 or 34_000. There was no error or warning in 
the client or server log about failing SQL queries.
     *   In a partitioned cache both servers do a query and the results are 
merged, if I understood correctly. It seems to me that one of the servers 
sometimes returns an empty result set and therefore the client gets a too small 
result set. Question is: why does this happen even without a warning on the 
server nodes about a failing query?
  2.  In that situation the client is not able to load a specific entry from 
the cache multiple times using TryGet(TK key, out TV value) 
(https://github.com/apache/ignite/blob/master/modules/platforms/dotnet/Apache.Ignite.Core/Cache/ICache.cs#L297).
 Those entries definitely are existing in the cache.
  3.  In that situation on one of both server nodes I get errors that an entry 
could not be loaded (like in 3) but on the affinity-server node!). In my 
understanding the compute jobs shall get executed on the primary node for the 
given key. And this node is not able to load an entry by that key (when under 
heavy CPU pressure)?

Something is strange here. Any ideas?

Cheers,
Dome

Reply via email to