Understanding the mechanics of peer class loading

Ilya Fri, 12 May 2017 05:58:05 -0700

Hi all!

The question was originally asked (but not answered) on SO:
http://stackoverflow.com/questions/43803402/how-does-peer-classloading-work-in-apache-ignite


In short, we have "Failed to deploy user message" exceptions under high load
in our project.

Here is an overview of our architecture:
- Distributed cache on three nodes, all nodes run on a single workstation
(in this test);
- Workers on each node;
- Messaging between workers is done using IgniteMessaging (topic has the
type of String and I've tried both byte[] and ByteBuffer as a message
class);
- Client connects to the cluster and triggers some business logic, that
causes cross-node messaging, scan queries and MR jobs (using
IgniteCompute::broadcast). All of these may performed concurrently.

I've tried both SHARED and CONTINUOUS deployment mode, but the result
remains the same.

I've noticed lots of similar messages in the logs:
/2017-05-05 13:31:28 INFO   org.apache.ignite.logger.java.JavaLogger info
Removed undeployed class: GridDeployment [ts=1493980288578,
depMode=CONTINUOUS, clsLdr=WebAppClassLoader=MyApp@38815daa,
clsLdrId=36c3828db51-0d65e7d5-77bf-444d-9b8b-d18bde94ad13, userVer=0,
loc=true, sampleClsName=java.lang.String, pendingUndeploy=false,
undeployed=true, usage=0]
...
2017-05-05 13:31:29 INFO   org.apache.ignite.logger.java.JavaLogger info
Removed undeployed class: GridDeployment [ts=1493980289125,
depMode=CONTINUOUS, clsLdr=WebAppClassLoader=MyApp@355f6680,
clsLdrId=1dd3828db51-1b20df7a-a98d-45a3-8ab6-e5d229945830, userVer=0,
loc=true, sampleClsName=java.lang.String, pendingUndeploy=false,
undeployed=true, usage=0]
.../

This happens when I use ByteBuffer as message type. In case of byte[], class
B[ is being constantly re-deployed.

ScanQuery predicate and IgniteCompute caller are also being constantly
re-deployed.
If we disable ScanQueries and IgniteCompute broadcasts - all is fine, there
are no re-deployments.

For the further testing I've disabled MRs and kept ScanQueries. I've also
added some debug output to a fresh snapshot of Ignite 2.1.0. Messages "Class
locally deployed: <my ScanQuery predicate>" usually come from the following
call stack:
/org.apache.ignite.internal.managers.deployment.GridDeploymentLocalStore.recordDeploy(GridDeploymentLocalStore.java:404)
        at
org.apache.ignite.internal.managers.deployment.GridDeploymentLocalStore.deploy(GridDeploymentLocalStore.java:333)
        at
org.apache.ignite.internal.managers.deployment.GridDeploymentLocalStore.getDeployment(GridDeploymentLocalStore.java:201)
        at
org.apache.ignite.internal.managers.deployment.GridDeploymentManager.getLocalDeployment(GridDeploymentManager.java:383)
        at
org.apache.ignite.internal.managers.deployment.GridDeploymentManager.getDeployment(GridDeploymentManager.java:345)
        at
org.apache.ignite.internal.processors.cache.query.GridCacheQueryManager.injectResources(GridCacheQueryManager.java:918)
        at
org.apache.ignite.internal.processors.cache.query.GridCacheQueryManager.scanIterator(GridCacheQueryManager.java:826)
        at
org.apache.ignite.internal.processors.cache.query.GridCacheQueryManager.executeQuery(GridCacheQueryManager.java:611)
        at
org.apache.ignite.internal.processors.cache.query.GridCacheQueryManager.queryResult(GridCacheQueryManager.java:1593)
        at
org.apache.ignite.internal.processors.cache.query.GridCacheQueryManager.runQuery(GridCacheQueryManager.java:1164)
        at
org.apache.ignite.internal.processors.cache.query.GridCacheDistributedQueryManager.processQueryRequest(GridCacheDistributedQueryManager.java:231)
        at
org.apache.ignite.internal.processors.cache.query.GridCacheDistributedQueryManager$2.apply(GridCacheDistributedQueryManager.java:109)
        at
org.apache.ignite.internal.processors.cache.query.GridCacheDistributedQueryManager$2.apply(GridCacheDistributedQueryManager.java:107)
        at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:863)
        at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:386)
        at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:308)
        at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$000(GridCacheIoManager.java:100)
        at
org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:253)
        at
org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1257)
        at
org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:885)
        at
org.apache.ignite.internal.managers.communication.GridIoManager.access$2100(GridIoManager.java:114)
        at
org.apache.ignite.internal.managers.communication.GridIoManager$7.run(GridIoManager.java:802)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)/

Messages like "removed undeployed class" usually come from the
IgniteMessaging's call stack.

I've analyzed the Ignite kernel a bit, and got a suspicion that undeploy is
being triggered for all classes in a classloader, when at least one class
that resides in that classloader was re-deployed in some other loader.

It happens inside
org.apache.ignite.spi.deployment.local.LocalDeploymentSpi#register

    At first, we get a "Map of new resources added for registered class
loader" using LocalDeploymentSpi#addResource.
    Then we "Remove resources for all class loaders except {@code
ignoreClsLdr}." using LocalDeploymentSpi#removeResources. Inside this
method, it looks like we add all loaders that contain the old version of the
new resource to a "doomed" collection.
    Finally, we iterate this collection and call onClassLoaderReleased for
each element. The latter action actually causes all the classes to be
undeployed (finally causing the "Removed undeployed class" messages).

I don't understand this concept. Why are there multiple classloaders? Why do
we undeploy the whole classloader in such cases?

I'd be grateful, if someone could explain, how does peer classloading work
in Ignite "under the hood".

P. S. I'm looking at the sources of a fresh snapshot of Ignite 2.1.0, but
the behavior is the same with the standard Ignite 1.9.0.

P. P. S. Unfortunately, I've did not manage to reproduce this issue outside
of our project yet.




--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/Understanding-the-mechanics-of-peer-class-loading-tp12661.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Understanding the mechanics of peer class loading

Reply via email to