Early on running on 2.3 we had hit a clear deadlock that I never root-caused, where the cluster just stopped working. At the time I was use the same DataStreamer from multiple threads and we tuned up the buffer size because of that, and we were running against EBS, and perhaps with too short timeouts. We have not seen this on 2.4 with a DataStreamer per producer thread with default parameters against SSDs. This problem seemed worse when I paid attention to the Ignite startup message about needing to set a message buffer/size limit, and specified one.
One thing still on my list, however, is to understand more about paired TCP connections and why (whether) they are the default. Fundamentally, if you are sending bi-directional request/response pairs over a single TCP virtual circuit, there is an inherent deadlock where responses may get behind requests that are flow controlled. With a single VC, the only general solution to this is to assume unlimited memory, reading requests from the VC and queuing them in memory, in order to be able to remove the responses. You can limit the memory usage on the receiver by limiting the total requests that can be sent at a higher level, but as node count scales, the receiver would need more memory. I've been assuming that paired connections is trying to address this fundamental issue to prevent requests from blocking responses, but I haven't gotten there yet. My impression was that paired connections are not the default. -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/
