Hello, I'm running a 2.9.0 cluster with 2 nodes. I tried to use grid grain's ControlCenterAgent to investigate a slowdown.
When I removed the agent files from server (I don't like to have to put it in all clients), the second node cannot join the cluster when I start it. If I start node A, then node B, node B fails, but if I start node B, then node A, node A fails. If I put the agent files back, then all nodes can start, but clients fail because they don't have the agent classes themselves. When a node fails to start, it prints this log : [17:52:45,265][INFO][tcp-disco-sock-reader-[2f3f6f3a 192.168.43.29:39675]-#6%ClusterWA%-#50%ClusterWA%][TcpDiscoverySpi] Initialized connection with remote server node [nodeId=2f3f6f3a-accb-4708-a5cc-26d324a07816, rmtAddr=/192.168.43.29:39675] [17:52:45,268][SEVERE][main][IgniteKernal%ClusterWA] Failed to start manager: GridManagerAdapter [enabled=true, name=o.a.i.i.managers.discovery.GridDiscoveryManager] class org.apache.ignite.IgniteCheckedException: Failed to start SPI: TcpDiscoverySpi [addrRslvr=null, sockTimeout=5000, ackTimeout=5000, marsh=JdkMarshaller [clsFilter=org.apache.ignite.marshaller.MarshallerUtils$1@39a8e2fa], reconCnt=10, reconDelay=2000, maxAckTimeout=600000, soLinger=5, forceSrvMode=false, clientReconnectDisabled=false, internalLsnr=null, skipAddrsRandomization=false] at org.apache.ignite.internal.managers.GridManagerAdapter.startSpi(GridManagerAdapter.java:302) at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.start(GridDiscoveryManager.java:967) at org.apache.ignite.internal.IgniteKernal.startManager(IgniteKernal.java:1935) at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1298) at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:2046) at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1698) at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1114) at org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:1032) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:918) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:817) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:687) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:656) at org.apache.ignite.Ignition.start(Ignition.java:353) at org.apache.ignite.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:300) Caused by: class org.apache.ignite.spi.IgniteSpiException: Unable to unmarshal key=metastorage.cluster.id.tag at org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.checkFailedError(TcpDiscoverySpi.java:2018) at org.apache.ignite.spi.discovery.tcp.ServerImpl.joinTopology(ServerImpl.java:1189) at org.apache.ignite.spi.discovery.tcp.ServerImpl.spiStart(ServerImpl.java:462) at org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.spiStart(TcpDiscoverySpi.java:2120) at org.apache.ignite.internal.managers.GridManagerAdapter.startSpi(GridManagerAdapter.java:299) ... 13 more [17:52:45,271][SEVERE][main][IgniteKernal%ClusterWA] Got exception while starting (will rollback startup routine). class org.apache.ignite.IgniteCheckedException: Failed to start manager: GridManagerAdapter [enabled=true, name=org.apache.ignite.internal.managers.discovery.GridDiscoveryManager] at org.apache.ignite.internal.IgniteKernal.startManager(IgniteKernal.java:1940) at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1298) at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:2046) at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1698) at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1114) at org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:1032) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:918) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:817) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:687) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:656) at org.apache.ignite.Ignition.start(Ignition.java:353) at org.apache.ignite.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:300) Caused by: class org.apache.ignite.IgniteCheckedException: Failed to start SPI: TcpDiscoverySpi [addrRslvr=null, sockTimeout=5000, ackTimeout=5000, marsh=JdkMarshaller [clsFilter=org.apache.ignite.marshaller.MarshallerUtils$1@39a8e2fa], reconCnt=10, reconDelay=2000, maxAckTimeout=600000, soLinger=5, forceSrvMode=false, clientReconnectDisabled=false, internalLsnr=null, skipAddrsRandomization=false] at org.apache.ignite.internal.managers.GridManagerAdapter.startSpi(GridManagerAdapter.java:302) at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.start(GridDiscoveryManager.java:967) at org.apache.ignite.internal.IgniteKernal.startManager(IgniteKernal.java:1935) ... 11 more Caused by: class org.apache.ignite.spi.IgniteSpiException: Unable to unmarshal key=metastorage.cluster.id.tag at org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.checkFailedError(TcpDiscoverySpi.java:2018) at org.apache.ignite.spi.discovery.tcp.ServerImpl.joinTopology(ServerImpl.java:1189) at org.apache.ignite.spi.discovery.tcp.ServerImpl.spiStart(ServerImpl.java:462) at org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.spiStart(TcpDiscoverySpi.java:2120) at org.apache.ignite.internal.managers.GridManagerAdapter.startSpi(GridManagerAdapter.java:299) ... 13 more [17:52:45,271][INFO][tcp-disco-sock-reader-[2f3f6f3a 192.168.43.29:39675]-#6%ClusterWA%-#50%ClusterWA%][TcpDiscoverySpi] Finished serving remote node connection [rmtAddr=/192.168.43.29:39675, rmtPort=39675 And the running node has this : [17:52:45,223][INFO][tcp-disco-sock-reader-[9a3233c6 192.168.43.30:54951]-#4%ClusterWA%-#55%ClusterWA%][TcpDiscoverySpi] Finished serving remote node connection [rmtAddr=/192.168.43.30:54951, rmtPort=54951 [17:52:45,246][INFO][tcp-disco-msg-worker-[crd]-#2%ClusterWA%-#46%ClusterWA%][GridEncryptionManager] Joining node doesn't have stored group keys [node=9a3233c6-3a6c-4be0-b5e7-19cdff30f69e] [17:52:45,266][WARNING][disco-pool-#56%ClusterWA%][TcpDiscoverySpi] Unable to unmarshal key=metastorage.cluster.id.tag If I start the nodes in the reverse order, it has this : [17:56:52,426][INFO][tcp-disco-sock-reader-[4b8b92f5 192.168.43.29:42557]-#4%ClusterWA%-#53%ClusterWA%][TcpDiscoverySpi] Finished serving remote node connection [rmtAddr=/192.168.43.29:42557, rmtPort=42557 [17:56:52,446][INFO][tcp-disco-msg-worker-[crd]-#2%ClusterWA%-#46%ClusterWA%][GridEncryptionManager] Joining node doesn't have stored group keys [node=4b8b92f5-1753-4b1b-9902-476c925fa49d] [17:56:52,466][WARNING][disco-pool-#54%ClusterWA%][TcpDiscoverySpi] Unable to unmarshal key=metastorage.cluster.id.tag Is there a way to recover ? Thanks, -- Bastien Durel DATA Intégration des données de l'entreprise, Systèmes d'information décisionnels. bastien.du...@data.fr tel : +33 (0) 1 57 19 59 28 fax : +33 (0) 1 57 19 59 73 45 avenue Carnot, 94230 CACHAN France www.data.fr