Hi all! The question was originally asked (but not answered) on SO: http://stackoverflow.com/questions/43803402/how-does-peer-classloading-work-in-apache-ignite
In short, we have "Failed to deploy user message" exceptions under high load in our project. Here is an overview of our architecture: - Distributed cache on three nodes, all nodes run on a single workstation (in this test); - Workers on each node; - Messaging between workers is done using IgniteMessaging (topic has the type of String and I've tried both byte[] and ByteBuffer as a message class); - Client connects to the cluster and triggers some business logic, that causes cross-node messaging, scan queries and MR jobs (using IgniteCompute::broadcast). All of these may performed concurrently. I've tried both SHARED and CONTINUOUS deployment mode, but the result remains the same. I've noticed lots of similar messages in the logs: /2017-05-05 13:31:28 INFO org.apache.ignite.logger.java.JavaLogger info Removed undeployed class: GridDeployment [ts=1493980288578, depMode=CONTINUOUS, clsLdr=WebAppClassLoader=MyApp@38815daa, clsLdrId=36c3828db51-0d65e7d5-77bf-444d-9b8b-d18bde94ad13, userVer=0, loc=true, sampleClsName=java.lang.String, pendingUndeploy=false, undeployed=true, usage=0] ... 2017-05-05 13:31:29 INFO org.apache.ignite.logger.java.JavaLogger info Removed undeployed class: GridDeployment [ts=1493980289125, depMode=CONTINUOUS, clsLdr=WebAppClassLoader=MyApp@355f6680, clsLdrId=1dd3828db51-1b20df7a-a98d-45a3-8ab6-e5d229945830, userVer=0, loc=true, sampleClsName=java.lang.String, pendingUndeploy=false, undeployed=true, usage=0] .../ This happens when I use ByteBuffer as message type. In case of byte[], class B[ is being constantly re-deployed. ScanQuery predicate and IgniteCompute caller are also being constantly re-deployed. If we disable ScanQueries and IgniteCompute broadcasts - all is fine, there are no re-deployments. For the further testing I've disabled MRs and kept ScanQueries. I've also added some debug output to a fresh snapshot of Ignite 2.1.0. Messages "Class locally deployed: <my ScanQuery predicate>" usually come from the following call stack: /org.apache.ignite.internal.managers.deployment.GridDeploymentLocalStore.recordDeploy(GridDeploymentLocalStore.java:404) at org.apache.ignite.internal.managers.deployment.GridDeploymentLocalStore.deploy(GridDeploymentLocalStore.java:333) at org.apache.ignite.internal.managers.deployment.GridDeploymentLocalStore.getDeployment(GridDeploymentLocalStore.java:201) at org.apache.ignite.internal.managers.deployment.GridDeploymentManager.getLocalDeployment(GridDeploymentManager.java:383) at org.apache.ignite.internal.managers.deployment.GridDeploymentManager.getDeployment(GridDeploymentManager.java:345) at org.apache.ignite.internal.processors.cache.query.GridCacheQueryManager.injectResources(GridCacheQueryManager.java:918) at org.apache.ignite.internal.processors.cache.query.GridCacheQueryManager.scanIterator(GridCacheQueryManager.java:826) at org.apache.ignite.internal.processors.cache.query.GridCacheQueryManager.executeQuery(GridCacheQueryManager.java:611) at org.apache.ignite.internal.processors.cache.query.GridCacheQueryManager.queryResult(GridCacheQueryManager.java:1593) at org.apache.ignite.internal.processors.cache.query.GridCacheQueryManager.runQuery(GridCacheQueryManager.java:1164) at org.apache.ignite.internal.processors.cache.query.GridCacheDistributedQueryManager.processQueryRequest(GridCacheDistributedQueryManager.java:231) at org.apache.ignite.internal.processors.cache.query.GridCacheDistributedQueryManager$2.apply(GridCacheDistributedQueryManager.java:109) at org.apache.ignite.internal.processors.cache.query.GridCacheDistributedQueryManager$2.apply(GridCacheDistributedQueryManager.java:107) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:863) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:386) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:308) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$000(GridCacheIoManager.java:100) at org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:253) at org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1257) at org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:885) at org.apache.ignite.internal.managers.communication.GridIoManager.access$2100(GridIoManager.java:114) at org.apache.ignite.internal.managers.communication.GridIoManager$7.run(GridIoManager.java:802) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)/ Messages like "removed undeployed class" usually come from the IgniteMessaging's call stack. I've analyzed the Ignite kernel a bit, and got a suspicion that undeploy is being triggered for all classes in a classloader, when at least one class that resides in that classloader was re-deployed in some other loader. It happens inside org.apache.ignite.spi.deployment.local.LocalDeploymentSpi#register At first, we get a "Map of new resources added for registered class loader" using LocalDeploymentSpi#addResource. Then we "Remove resources for all class loaders except {@code ignoreClsLdr}." using LocalDeploymentSpi#removeResources. Inside this method, it looks like we add all loaders that contain the old version of the new resource to a "doomed" collection. Finally, we iterate this collection and call onClassLoaderReleased for each element. The latter action actually causes all the classes to be undeployed (finally causing the "Removed undeployed class" messages). I don't understand this concept. Why are there multiple classloaders? Why do we undeploy the whole classloader in such cases? I'd be grateful, if someone could explain, how does peer classloading work in Ignite "under the hood". P. S. I'm looking at the sources of a fresh snapshot of Ignite 2.1.0, but the behavior is the same with the standard Ignite 1.9.0. P. P. S. Unfortunately, I've did not manage to reproduce this issue outside of our project yet. -- View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Understanding-the-mechanics-of-peer-class-loading-tp12661.html Sent from the Apache Ignite Users mailing list archive at Nabble.com.