Denis Chudov created IGNITE-12760:
-------------------------------------

             Summary: Prevent AssertionError on message unmarshalling, when 
classLoaderId contains id of node that already left
                 Key: IGNITE-12760
                 URL: https://issues.apache.org/jira/browse/IGNITE-12760
             Project: Ignite
          Issue Type: Bug
            Reporter: Denis Chudov
            Assignee: Denis Chudov


Following assertion error triggers failure handler and crashes the node. Can 
possibly crash the whole cluster.


{code:java}
2020-02-18 
14:34:09.775\[ERROR]\[query-#146129%DPL_GRID%DplGridNodeName%]\[o.a.i.i.p.cache.GridCacheIoManager]
 Failed to process message \[senderId=727757ed-4ad4-4779-bda9-081525725cce, 
msg=GridCacheQueryRequest \[id=178, 
cacheName=com.sbt.tokenization.data.entity.KEKEntity_DPL_union-module, 
type=SCAN, fields=false, clause=null, clsName=null, keyValFilter=null, 
rdc=null, trans=null, pageSize=1024, incBackups=false, cancel=false, 
incMeta=false, all=false, keepBinary=true, 
subjId=727757ed-4ad4-4779-bda9-081525725cce, taskHash=0, part=-1, 
topVer=AffinityTopologyVersion \[topVer=97, minorTopVer=0], sendTimestamp=-1, 
receiveTimestamp=-1, super=GridCacheIdMessage \[cacheId=-1129073400, 
super=GridCacheMessage \[msgId=179, depInfo=GridDeploymentInfoBean 
\[clsLdrId=c32670e3071-d30ee64b-0833-45d4-abbe-fb6282669caa, depMode=SHARED, 
userVer=0, locDepOwner=false, participants=null], 
lastAffChangedTopVer=AffinityTopologyVersion \[topVer=8, minorTopVer=6], 
err=null, skipPrepare=false]]]]
java.lang.AssertionError: null
at 
org.apache.ignite.internal.processors.cache.GridCacheDeploymentManager$CachedDeploymentInfo.<init>(GridCacheDeploymentManager.java:918)
at 
org.apache.ignite.internal.processors.cache.GridCacheDeploymentManager$CachedDeploymentInfo.<init>(GridCacheDeploymentManager.java:889)
at 
org.apache.ignite.internal.processors.cache.GridCacheDeploymentManager.p2pContext(GridCacheDeploymentManager.java:422)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.unmarshall(GridCacheIoManager.java:1576)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:584)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:386)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:312)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:102)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:301)
at 
org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1565)
at 
org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1189)
at 
org.apache.ignite.internal.managers.communication.GridIoManager.access$4300(GridIoManager.java:130)
at 
org.apache.ignite.internal.managers.communication.GridIoManager$8.run(GridIoManager.java:1092)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748){code}

There is no fair reproducer for now, but it seems that we should prevent such 
situation in general like following:
1) check the correctness of the message before it will be sent - inside of 
GridCacheDeploymentManager#prepare. If we have the corresponding class loader 
on local node, we can try to fix message and replace wrong class loader with 
local one.
2) log suspicious deployments which we receive from 
GridDeploymentManager#deploy - maybe we have obsolete deployments in caches. 
3) possibly we can remove this assertion, we should have this class on sender 
node and use it as class loader id, and if we don't, we will receive exception 
on finishUnmarshall (Failed to peer load class) and try to process this 
situation with GridCacheIoManager#processFailedMessage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to