Is Kubernetes running liveness probes that connect to GridGain ports? That could be your culprit.
On Tue, Nov 14, 2023 at 4:57 AM Humphrey Lopez <[email protected]> wrote: > We have several server nodes and thick client nodes, we have Ignite > embedded in spring boot. The IP address of the remote node is indeed of our > client. And as we have deployed everything in one namespace in kubernetes I > don't think something else is sending data to the server node from that IP. > However we are using IgniteDataStreamer to stream data to the server. > Before we were using Ignite 2.11 for a long time (JDK 8 and later 11), and > now just upgraded to JDK 17 and Ignite 2.15. Only thing we have changed > lately is to set the IgniteDataStreamer to allowOverwrite = true, but I > don't think this is the cause. We were getting a lot of warnings when it > was set to false when using 2.15 of ignite. > > If I take one example: > (Server Node timestamp: November 14th 2023, 11:11:30.707) > Error: > <#fb58e00b> o.a.i.IgniteException: Invalid message type: 2057 > at > o.a.i.i.m.c.IgniteMessageFactoryImpl.create(IgniteMessageFactoryImpl.java:133) > at > o.a.i.s.c.t.i.GridNioServerWrapper$2.create(GridNioServerWrapper.java:813) > at o.a.i.i.u.n.GridDirectParser.decode(GridDirectParser.java:81) > at > o.a.i.i.u.n.GridNioCodecFilter.onMessageReceived(GridNioCodecFilter.java:113) > at > o.a.i.i.u.n.GridNioFilterAdapter.proceedMessageReceived(GridNioFilterAdapter.java:109) > at > o.a.i.i.u.n.GridConnectionBytesVerifyFilter.onMessageReceived(GridConnectionBytesVerifyFilter.java:133) > at > o.a.i.i.u.n.GridNioFilterAdapter.proceedMessageReceived(GridNioFilterAdapter.java:109) > at > o.a.i.i.u.n.GridNioServer$HeadFilter.onMessageReceived(GridNioServer.java:3752) > at > o.a.i.i.u.n.GridNioFilterChain.onMessageReceived(GridNioFilterChain.java:175) > at > o.a.i.i.u.n.GridNioServer$DirectNioClientWorker.processRead(GridNioServer.java:1379) > at > o.a.i.i.u.n.GridNioServer$AbstractNioClientWorker.processSelectedKeysOptimized(GridNioServer.java:2526) > at > o.a.i.i.u.n.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:2281) > at > o.a.i.i.u.n.GridNioServer$AbstractNioClientWorker.body(GridNioServer.java:1910) > at o.a.i.i.u.w.GridWorker.run(GridWorker.java:125) > at java.lang.Thread.run(Thread.java:833) > > Message: > Failed to read message [msg=null, buf=java.nio.DirectByteBuffer[pos=9 > lim=334 cap=32768], reader=DirectMessageReader [state=DirectMessageState > [pos=0, stack=[StateItem [stream=DirectByteBufferStreamImplV2 > [baseOff=139733741570896, arrOff=-1, tmpArrOff=0, valReadBytes=0, > tmpArrBytes=0, msgTypeDone=false, msg=null, mapIt=null, it=null, arrPos=-1, > keyDone=false, readSize=-1, readItems=0, prim=0, primShift=0, uuidState=0, > uuidMost=0, uuidLeast=0, uuidLocId=0], state=0], StateItem > [stream=DirectByteBufferStreamImplV2 [baseOff=139733741570896, arrOff=-1, > tmpArrOff=0, valReadBytes=0, tmpArrBytes=0, msgTypeDone=false, msg=null, > mapIt=null, it=null, arrPos=-1, keyDone=false, readSize=-1, readItems=0, > prim=0, primShift=0, uuidState=0, uuidMost=0, uuidLeast=0, uuidLocId=0], > state=0], StateItem [stream=DirectByteBufferStreamImplV2 > [baseOff=139733741570896, arrOff=-1, tmpArrOff=0, valReadBytes=0, > tmpArrBytes=0, msgTypeDone=false, msg=null, mapIt=null, it=null, arrPos=-1, > keyDone=false, readSize=-1, readItems=0, prim=0, primShift=0, uuidState=0, > uuidMost=0, uuidLeast=0, uuidLocId=0], state=0], null, null, null, null, > null, null, null]], protoVer=3, lastRead=true], > ses=GridSelectorNioSessionImpl [worker=DirectNioClientWorker > [super=AbstractNioClientWorker [idx=0, bytesRcvd=107237239386, > bytesSent=43987266268, bytesRcvd0=3776389, bytesSent0=1156504, select=true, > super=GridWorker [name=grid-nio-worker-tcp-comm-0, > igniteInstanceName=TcpCommunicationSpi, finished=false, > heartbeatTs=1699956690699, hashCode=167055170, interrupted=false, > runner=grid-nio-worker-tcp-comm-0-#47%TcpCommunicationSpi%]]], > writeBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], > readBuf=java.nio.DirectByteBuffer[pos=9 lim=334 cap=32768], > inRecovery=GridNioRecoveryDescriptor [acked=321632, resendCnt=0, > rcvCnt=345516, sentCnt=321662, reserved=true, lastAck=345504, > nodeLeft=false, node=TcpDiscoveryNode > [id=c45e0d94-8cb4-4e23-8ff3-29f573117b58, > consistentId=c45e0d94-8cb4-4e23-8ff3-29f573117b58, addrs=ArrayList > [client_ip, 127.0.0.1], sockAddrs=null, discPort=0, order=27, intOrder=27, > lastExchangeTime=1699691909333, loc=false, > ver=2.15.0#20230425-sha1:f98f7f35, isClient=true], connected=true, > connectCnt=0, queueLimit=4096, reserveCnt=1, pairedConnections=false], > outRecovery=GridNioRecoveryDescriptor [acked=321632, resendCnt=0, > rcvCnt=345516, sentCnt=321662, reserved=true, lastAck=345504, > nodeLeft=false, node=TcpDiscoveryNode > [id=c45e0d94-8cb4-4e23-8ff3-29f573117b58, > consistentId=c45e0d94-8cb4-4e23-8ff3-29f573117b58, addrs=ArrayList > [client_ip 127.0.0.1], sockAddrs=null, discPort=0, order=27, intOrder=27, > lastExchangeTime=1699691909333, loc=false, > ver=2.15.0#20230425-sha1:f98f7f35, isClient=true], connected=true, > connectCnt=0, queueLimit=4096, reserveCnt=1, pairedConnections=false], > closeSocket=true, > outboundMessagesQueueSizeMetric=o.a.i.i.processors.metric.impl.LongAdderMetric@69a257d1, > super=GridNioSessionImpl [locAddr=/server_ip:47100, > rmtAddr=/client_ip:34662, createTime=1699700035875, closeTime=0, > bytesSent=5758138431, bytesRcvd=47248795615, bytesSent0=195008, > bytesRcvd0=1614751, sndSchedTime=1699937197520, lastSndTime=1699956690669, > lastRcvTime=1699956690699, readsPaused=false, > filterChain=FilterChain[filters=[GridNioCodecFilter > [parser=o.a.i.i.util.nio.GridDirectParser@7141a1d9, directMode=true], > GridConnectionBytesVerifyFilter], accepted=true, markedForClose=false]]] > > On the client side I don't see any errors happening around that time, I > have also searched for warnings, but nothing. > > I've seen this post where they are also using the streaming api and > getting similar errors. > > https://lists.apache.org/thread/jgf2jrp231jd5rhdbh7f5sb8gnclocl8 > > My guess maybe it has to do with the datastreamer somehow? > > Humphrey > > Op ma 13 nov 2023 om 21:10 schreef Jeremy McMillan < > [email protected]>: > >> These errors look like something which does not speak Ignite binary >> protocol is connecting and sending useless stuff to your Ignite cluster. >> >> IgniteException: Invalid message type: 2057 >> >> >> Check the configuration of the client if the host generating this traffic >> is known, and check firewalls or monitoring tools if not. >> >> On Mon, Nov 13, 2023 at 8:04 AM Humphrey Lopez <[email protected]> >> wrote: >> >>> Other errors we are seeing: >>> >>> Failed to read message [msg=null, buf=java.nio.DirectByteBuffer[pos=2 >>> lim=162 cap=32768], reader=DirectMessageReader [state=DirectMessageState >>> [pos=0, stack=[StateItem [stream=DirectByteBufferStreamImplV2 >>> [baseOff=140381476511056, arrOff=-1, tmpArrOff=0, valReadBytes=0, >>> tmpArrBytes=0, msgTypeDone=false, msg=null, mapIt=null, it=null, arrPos=-1, >>> keyDone=false, readSize=-1, readItems=0, prim=0, primShift=0, uuidState=0, >>> uuidMost=0, uuidLeast=0, uuidLocId=0], state=0], StateItem >>> [stream=DirectByteBufferStreamImplV2 [baseOff=140381476511056, arrOff=-1, >>> tmpArrOff=0, valReadBytes=0, tmpArrBytes=0, msgTypeDone=false, msg=null, >>> mapIt=null, it=null, arrPos=-1, keyDone=false, readSize=-1, readItems=0, >>> prim=0, primShift=0, uuidState=0, uuidMost=0, uuidLeast=0, uuidLocId=0], >>> state=0], StateItem [stream=DirectByteBufferStreamImplV2 >>> [baseOff=140381476511056, arrOff=-1, tmpArrOff=0, valReadBytes=0, >>> tmpArrBytes=0, msgTypeDone=false, msg=null, mapIt=null, it=null, arrPos=-1, >>> keyDone=false, readSize=-1, readItems=0, prim=0, primShift=0, uuidState=0, >>> uuidMost=0, uuidLeast=0, uuidLocId=0], state=0], null, null, null, null, >>> null, null, null]], protoVer=3, lastRead=true], >>> ses=GridSelectorNioSessionImpl [worker=DirectNioClientWorker >>> [super=AbstractNioClientWorker [idx=1, bytesRcvd=6506344847, >>> bytesSent=5800573007, bytesRcvd0=5461705, bytesSent0=197830, select=true, >>> super=GridWorker [name=grid-nio-worker-tcp-comm-1, >>> igniteInstanceName=TcpCommunicationSpi, finished=false, >>> heartbeatTs=1699706651957, hashCode=2094994491, interrupted=false, >>> runner=grid-nio-worker-tcp-comm-1-#48%TcpCommunicationSpi%]]], >>> writeBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], >>> readBuf=java.nio.DirectByteBuffer[pos=2 lim=162 cap=32768], >>> inRecovery=GridNioRecoveryDescriptor [acked=47232, resendCnt=0, >>> rcvCnt=53951, sentCnt=47247, reserved=true, lastAck=53920, nodeLeft=false, >>> node=TcpDiscoveryNode [id=34cfcc64-d369-415b-b14f-6ac222087232, >>> consistentId=34cfcc64-d369-415b-b14f-6ac222087232, addrs=ArrayList >>> [xx.xxx.xx.xxx, 127.0.0.1], sockAddrs=null, discPort=0, order=24, >>> intOrder=24, lastExchangeTime=1699691906215, loc=false, >>> ver=2.15.0#20230425-sha1:f98f7f35, isClient=true], connected=true, >>> connectCnt=0, queueLimit=4096, reserveCnt=1, pairedConnections=false], >>> outRecovery=GridNioRecoveryDescriptor [acked=47232, resendCnt=0, >>> rcvCnt=53951, sentCnt=47247, reserved=true, lastAck=53920, nodeLeft=false, >>> node=TcpDiscoveryNode [id=34cfcc64-d369-415b-b14f-6ac222087232, >>> consistentId=34cfcc64-d369-415b-b14f-6ac222087232, addrs=ArrayList >>> [xx.xxx.xx.xxx, 127.0.0.1], sockAddrs=null, discPort=0, order=24, >>> intOrder=24, lastExchangeTime=1699691906215, loc=false, >>> ver=2.15.0#20230425-sha1:f98f7f35, isClient=true], connected=true, >>> connectCnt=0, queueLimit=4096, reserveCnt=1, pairedConnections=false], >>> closeSocket=true, >>> outboundMessagesQueueSizeMetric=o.a.i.i.processors.metric.impl.LongAdderMetric@69a257d1, >>> super=GridNioSessionImpl [locAddr=/xx.xxx.xx.xx:47100, >>> rmtAddr=/xx.xxx.xx.xxx:35492, createTime=1699700043744, closeTime=0, >>> bytesSent=74190856, bytesRcvd=167712723, bytesSent0=0, bytesRcvd0=5260541, >>> sndSchedTime=1699700043744, lastSndTime=1699706650143, >>> lastRcvTime=1699706651957, readsPaused=false, >>> filterChain=FilterChain[filters=[GridNioCodecFilter >>> [parser=o.a.i.i.util.nio.GridDirectParser@6c311b05, directMode=true], >>> GridConnectionBytesVerifyFilter], accepted=true, markedForClose=false]]] >>> <#fb58e00b> o.a.i.IgniteException: Invalid message type: 2057 at >>> o.a.i.i.m.c.IgniteMessageFactoryImpl.create(IgniteMessageFactoryImpl.java:133) >>> at >>> o.a.i.s.c.t.i.GridNioServerWrapper$2.create(GridNioServerWrapper.java:813) >>> at o.a.i.i.u.n.GridDirectParser.decode(GridDirectParser.java:81) at >>> o.a.i.i.u.n.GridNioCodecFilter.onMessageReceived(GridNioCodecFilter.java:113) >>> at >>> o.a.i.i.u.n.GridNioFilterAdapter.proceedMessageReceived(GridNioFilterAdapter.java:109) >>> at >>> o.a.i.i.u.n.GridConnectionBytesVerifyFilter.onMessageReceived(GridConnectionBytesVerifyFilter.java:133) >>> at >>> o.a.i.i.u.n.GridNioFilterAdapter.proceedMessageReceived(GridNioFilterAdapter.java:109) >>> at >>> o.a.i.i.u.n.GridNioServer$HeadFilter.onMessageReceived(GridNioServer.java:3752) >>> at >>> o.a.i.i.u.n.GridNioFilterChain.onMessageReceived(GridNioFilterChain.java:175) >>> at >>> o.a.i.i.u.n.GridNioServer$DirectNioClientWorker.processRead(GridNioServer.java:1379) >>> at >>> o.a.i.i.u.n.GridNioServer$AbstractNioClientWorker.processSelectedKeysOptimized(GridNioServer.java:2526) >>> at >>> o.a.i.i.u.n.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:2281) >>> at >>> o.a.i.i.u.n.GridNioServer$AbstractNioClientWorker.body(GridNioServer.java:1910) >>> at o.a.i.i.u.w.GridWorker.run(GridWorker.java:125) at >>> java.lang.Thread.run(Thread.java:833) >>> >>> Op ma 13 nov 2023 om 14:33 schreef Humphrey Lopez <[email protected]>: >>> >>>> Hello Ignite community. >>>> >>>> We are running Ignite 2.15 in production with JDK 17. We are seeing the >>>> following errors and have no idea what is causing it. >>>> >>>> Failed to process selector key [ses=GridSelectorNioSessionImpl >>>> [worker=DirectNioClientWorker [super=AbstractNioClientWorker [idx=3, >>>> bytesRcvd=64732383766, bytesSent=30081901336, bytesRcvd0=0, bytesSent0=0, >>>> select=true, super=GridWorker [name=grid-nio-worker-tcp-comm-3, >>>> igniteInstanceName=TcpCommunicationSpi, finished=false, >>>> heartbeatTs=1699879571052, hashCode=475467093, interrupted=false, >>>> runner=grid-nio-worker-tcp-comm-3-#50%TcpCommunicationSpi%]]], >>>> writeBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], >>>> readBuf=java.nio.DirectByteBuffer[pos=10 lim=282 cap=32768], >>>> inRecovery=GridNioRecoveryDescriptor [acked=209920, resendCnt=0, >>>> rcvCnt=229599, sentCnt=209938, reserved=true, lastAck=229568, >>>> nodeLeft=false, node=TcpDiscoveryNode >>>> [id=c8353de9-9cd2-4ae5-bc48-3271c47fffae, >>>> consistentId=c8353de9-9cd2-4ae5-bc48-3271c47fffae, addrs=ArrayList >>>> [machine1, 127.0.0.1], sockAddrs=null, discPort=0, order=26, intOrder=26, >>>> lastExchangeTime=1699691907837, loc=false, >>>> ver=2.15.0#20230425-sha1:f98f7f35, isClient=true], connected=true, >>>> connectCnt=0, queueLimit=4096, reserveCnt=1, pairedConnections=false], >>>> outRecovery=GridNioRecoveryDescriptor [acked=209920, resendCnt=0, >>>> rcvCnt=229599, sentCnt=209938, reserved=true, lastAck=229568, >>>> nodeLeft=false, node=TcpDiscoveryNode >>>> [id=c8353de9-9cd2-4ae5-bc48-3271c47fffae, >>>> consistentId=c8353de9-9cd2-4ae5-bc48-3271c47fffae, addrs=ArrayList >>>> [machine1, 127.0.0.1], sockAddrs=null, discPort=0, order=26, intOrder=26, >>>> lastExchangeTime=1699691907837, loc=false, >>>> ver=2.15.0#20230425-sha1:f98f7f35, isClient=true], connected=true, >>>> connectCnt=0, queueLimit=4096, reserveCnt=1, pairedConnections=false], >>>> closeSocket=true, >>>> outboundMessagesQueueSizeMetric=o.a.i.i.processors.metric.impl.LongAdderMetric@69a257d1, >>>> super=GridNioSessionImpl [locAddr=/10.129.34.235:47100, >>>> rmtAddr=/machine1:42492, createTime=1699700047871, closeTime=0, >>>> bytesSent=3311788770, bytesRcvd=23387281236, bytesSent0=0, bytesRcvd0=0, >>>> sndSchedTime=1699852982273, lastSndTime=1699879565624, >>>> lastRcvTime=1699879571052, readsPaused=false, >>>> filterChain=FilterChain[filters=[GridNioCodecFilter >>>> [parser=o.a.i.i.util.nio.GridDirectParser@f6e5016, directMode=true], >>>> GridConnectionBytesVerifyFilter], accepted=true, markedForClose=false]]] >>>> >>>> And there is also this stacktrace: >>>> j.l.NullPointerException: Cannot invoke "Object.hashCode()" because >>>> "key" is null >>>> at j.u.c.ConcurrentHashMap.get(ConcurrentHashMap.java:936) >>>> at >>>> o.a.i.i.m.c.GridIoManager.processOrderedMessage(GridIoManager.java:1707) >>>> at o.a.i.i.m.c.GridIoManager.onMessage0(GridIoManager.java:1328) >>>> at o.a.i.i.m.c.GridIoManager.access$300(GridIoManager.java:243) >>>> at o.a.i.i.m.c.GridIoManager$2.onMessage(GridIoManager.java:509) >>>> at >>>> o.a.i.s.c.t.TcpCommunicationSpi.notifyListener(TcpCommunicationSpi.java:1220) >>>> at >>>> o.a.i.s.c.t.TcpCommunicationSpi$1.onMessage(TcpCommunicationSpi.java:689) >>>> at >>>> o.a.i.s.c.t.TcpCommunicationSpi$1.onMessage(TcpCommunicationSpi.java:687) >>>> at >>>> o.a.i.s.c.t.i.InboundConnectionHandler.onMessage(InboundConnectionHandler.java:392) >>>> at >>>> o.a.i.s.c.t.i.InboundConnectionHandler.onMessage(InboundConnectionHandler.java:78) >>>> at >>>> o.a.i.i.u.n.GridNioFilterChain$TailFilter.onMessageReceived(GridNioFilterChain.java:279) >>>> at >>>> o.a.i.i.u.n.GridNioFilterAdapter.proceedMessageReceived(GridNioFilterAdapter.java:109) >>>> at >>>> o.a.i.i.u.n.GridNioCodecFilter.onMessageReceived(GridNioCodecFilter.java:116) >>>> at >>>> o.a.i.i.u.n.GridNioFilterAdapter.proceedMessageReceived(GridNioFilterAdapter.java:109) >>>> at >>>> o.a.i.i.u.n.GridConnectionBytesVerifyFilter.onMessageReceived(GridConnectionBytesVerifyFilter.java:88) >>>> at >>>> o.a.i.i.u.n.GridNioFilterAdapter.proceedMessageReceived(GridNioFilterAdapter.java:109) >>>> at >>>> o.a.i.i.u.n.GridNioServer$HeadFilter.onMessageReceived(GridNioServer.java:3752) >>>> at >>>> o.a.i.i.u.n.GridNioFilterChain.onMessageReceived(GridNioFilterChain.java:175) >>>> at >>>> o.a.i.i.u.n.GridNioServer$DirectNioClientWorker.processRead(GridNioServer.java:1379) >>>> at >>>> o.a.i.i.u.n.GridNioServer$AbstractNioClientWorker.processSelectedKeysOptimized(GridNioServer.java:2526) >>>> at >>>> o.a.i.i.u.n.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:2281) >>>> at >>>> o.a.i.i.u.n.GridNioServer$AbstractNioClientWorker.body(GridNioServer.java:1910) >>>> at o.a.i.i.u.w.GridWorker.run(GridWorker.java:125) >>>> at java.lang.Thread.run(Thread.java:833) >>>> >>>> We are using the following flags: >>>> >>>> "-XX:+AlwaysPreTouch", >>>> "-XX:+UseG1GC", >>>> "-XX:+ScavengeBeforeFullGC", >>>> "-XX:+DisableExplicitGC", >>>> "-XX:MaxMetaspaceSize=640m", >>>> "-Djava.net.preferIPv4Stack=true", >>>> "-DIGNITE_QUIET=false", >>>> "-DIGNITE_UPDATE_NOTIFIER=false", >>>> "-DIGNITE_WAIT_FOR_BACKUPS_ON_SHUTDOWN=true", >>>> "--add-opens=jdk.management/com.sun.management.internal=ALL-UNNAMED", >>>> "--add-opens=java.base/jdk.internal.misc=ALL-UNNAMED", >>>> "--add-opens=java.base/sun.nio.ch=ALL-UNNAMED", >>>> "--add-opens=java.management/com.sun.jmx.mbeanserver=ALL-UNNAMED", >>>> "--add-opens=jdk.internal.jvmstat/sun.jvmstat.monitor=ALL-UNNAMED", >>>> "--add-opens=java.base/sun.reflect.generics.reflectiveObjects=ALL-UNNAMED", >>>> "--add-opens=java.base/java.io=ALL-UNNAMED", >>>> "--add-opens=java.base/java.nio=ALL-UNNAMED", >>>> "--add-opens=java.base/java.util=ALL-UNNAMED", >>>> "--add-opens=java.base/java.util.concurrent=ALL-UNNAMED", >>>> "--add-opens=java.base/java.util.concurrent.locks=ALL-UNNAMED", >>>> "--add-opens=java.base/java.lang=ALL-UNNAMED", >>>> "--add-opens=java.base/java.time=ALL-UNNAMED", >>>> "--add-opens=java.base/java.lang.invoke=ALL-UNNAMED" >>>> >>>> Is there anything I'm missing? The error is happening on a server node, >>>> but looks like the remote address is a client node? I see isClient = true >>>> in the message. What does it mean and what can we do to fix it? >>>> >>>> Humphrey >>>> >>>>
