Hi, getting back to this topic in more detail. The container meant here is a normal Java VM running a specific middleware which offers a "container" concept. It has nothing to do with docker.
The config is read from disc. Best regards Stefan -----Original Message----- From: Yuval Dori [mailto:[email protected]] Sent: Dienstag, 10. April 2018 14:34 To: [email protected] Cc: LEONHARTSBERGER Stefan Subject: Re: IllegalStateException: instance must be started before calling this method Hi, I have answered the following at March 26: Hi Martin, Not in a Docker. We are using the word container for one of our services. Thanks, Yuval > On Apr 10, 2018, at 2:40 PM, Martin Gainty <[email protected]> wrote: > > MG>waiting for Yuval to answer: > > > in docker container? > if so how do you resolve hosts (.hosts file, kubernetes, swarm)? > > > MG>? > > ________________________________ > From: VALLASTER Stefan <[email protected]> > Sent: Tuesday, April 10, 2018 4:46 AM > To: [email protected] > Subject: RE: IllegalStateException: instance must be started before calling > this method > > How can we get this topic forward again? It seems the mail thread is > currently stalled for no obvious reason. Anyone willing to help us out here > again? > > Best regards > Stefan > > -----Original Message----- > From: Martin Gainty [mailto:[email protected]] > Sent: Sonntag, 25. März 2018 22:58 > To: [email protected] > Subject: Re: IllegalStateException: instance must be started before calling > this method > > > > in docker container? > > if so how do you resolve hosts (.hosts file, kubernetes, swarm)? > > M > ________________________________ > From: Yuval Dori <[email protected]> > Sent: Sunday, March 25, 2018 5:34 AM > To: [email protected] > Cc: Yuval Dori > Subject: IllegalStateException: instance must be started before calling this > method > > Hi, > > Our customer is using ZK 3.4.3 and got IllegalStateException: instance must > be started before calling this method > > > with the following stack trace: > 2018-02-15T09:58:12.094+0100; ERROR; WSOSTSLXWIT01/MANAGER; P3424/T194; > [SPACE/LearnerHandler-/10.17.46.142:49336/LearnerHandler<http://10.17.46.142:49336/LearnerHandler>]; > Unexpected exception causing shutdown while sock still open > java.net<http://java.net>.SocketTimeoutException: Read timed out > at java.net.SocketInputStream.socketRead0(Native Method) > at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) > at java.net.SocketInputStream.read(SocketInputStream.java:171) > at java.net.SocketInputStream.read(SocketInputStream.java:141) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) > at java.io.BufferedInputStream.read(BufferedInputStream.java:265) > at java.io.DataInputStream.readInt(DataInputStream.java:387) > at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) > at > org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83) > at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:99) > at > org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:542) > > > This is the description: > > At first ps-main.1 is started as primary on WSOSTSLXBMS01 GSC 3. > > 24.01.2018 11:31:04,947 +01:00 INFO WSOSTSLXBMS01/GSC/3 P1280/T120 > SPACE/Curator-LeaderSelector-0/1 Space instance is Primary > > After some relocation of ps-main instances a new backup instance is started > on the same GSC. > > 24.01.2018 11:32:59,886 +01:00 INFO WSOSTSLXBMS01/GSC/3 P1280/T146 > SPACE/GS-LRMI Connection-pool-1-thread-6/1 Space instance [ > frqMain_container1:frqMain ] has been selected as Backup > > Later a failover test is performed by shutting down WSOSTSLXBMS02 and > WSOSTSLXMSS02. > We would expect that ps-main.1 backup will be elected as primary because the > former primary was located at WSOSTSLXBMS02 but no election happend. In the > end in the whole cluster there was no primary of ps-main.1 running. > > We found suspicious exception in the logs: > 24.01.2018 11:32:44,987 +01:00 INFO WSOSTSLXBMS01/GSC/3 P1280/T120 > SPACE/Curator-LeaderSelector-0/frqMain_container1:frqMain Lost leadership > [sessionId=0x36128253b820001] > 24.01.2018 11:32:44,987 +01:00 INFO WSOSTSLXBMS01/GSC/3 P1280/T98 > SPACE/Curator-Framework-0/CuratorFrameworkImpl backgroundOperationsLoop > exiting > 24.01.2018 11:32:44,990 +01:00 INFO WSOSTSLXBMS01/GSC/3 P1280/T120 > SPACE/Curator-LeaderSelector-0/1 Quiesce state set to SUSPENDED > 24.01.2018 11:32:44,991 +01:00 INFO WSOSTSLXBMS01/GSC/3 P1280/T24 > SPACE/GS-LRMI Connection-pool-1-thread-1/ZooKeeper Session: 0x36128253b820001 > closed > 24.01.2018 11:32:44,991 +01:00 ERROR WSOSTSLXBMS01/GSC/3 P1280/T120 > SPACE/Curator-LeaderSelector-0/LeaderSelector The leader threw an exception > 24.01.2018 11:32:44,991 +01:00 ERROR WSOSTSLXBMS01/GSC/3 P1280/T120 > SPACE/Curator-LeaderSelector-0/LeaderSelector > java.lang.IllegalStateException: instance must be started before calling this > method > 24.01.2018 11:32:44,991 +01:00 ERROR WSOSTSLXBMS01/GSC/3 P1280/T120 > SPACE/Curator-LeaderSelector-0/LeaderSelector at > org.apache.curator.shaded.com<http://org.apache.curator.shaded.com>.google.common.base.Preconditions.checkState(Preconditions.java:176) > 24.01.2018 11:32:44,991 +01:00 ERROR WSOSTSLXBMS01/GSC/3 P1280/T120 > SPACE/Curator-LeaderSelector-0/LeaderSelector at > org.apache.curator.framework.imps.CuratorFrameworkImpl.delete(CuratorFrameworkImpl.java:359) > 24.01.2018 11:32:44,991 +01:00 ERROR WSOSTSLXBMS01/GSC/3 P1280/T120 > SPACE/Curator-LeaderSelector-0/LeaderSelector at > org.apache.curator.framework.recipes.locks.LockInternals.deleteOurPath(LockInternals.java:339) > 24.01.2018 11:32:44,991 +01:00 ERROR WSOSTSLXBMS01/GSC/3 P1280/T120 > SPACE/Curator-LeaderSelector-0/LeaderSelector at > org.apache.curator.framework.recipes.locks.LockInternals.releaseLock(LockInternals.java:123) > 24.01.2018 11:32:44,991 +01:00 ERROR WSOSTSLXBMS01/GSC/3 P1280/T120 > SPACE/Curator-LeaderSelector-0/LeaderSelector at > org.apache.curator.framework.recipes.locks.InterProcessMutex.release(InterProcessMutex.java:154) > 24.01.2018 11:32:44,991 +01:00 ERROR WSOSTSLXBMS01/GSC/3 P1280/T120 > SPACE/Curator-LeaderSelector-0/LeaderSelector at > org.apache.curator.framework.recipes.leader.LeaderSelector.doWork(LeaderSelector.java:449) > 24.01.2018 11:32:44,991 +01:00 ERROR WSOSTSLXBMS01/GSC/3 P1280/T120 > SPACE/Curator-LeaderSelector-0/LeaderSelector at > org.apache.curator.framework.recipes.leader.LeaderSelector.doWorkLoop(LeaderSelector.java:466) > 24.01.2018 11:32:44,991 +01:00 ERROR WSOSTSLXBMS01/GSC/3 P1280/T120 > SPACE/Curator-LeaderSelector-0/LeaderSelector at > org.apache.curator.framework.recipes.leader.LeaderSelector.access$100(LeaderSelector.java:65) > 24.01.2018 11:32:44,991 +01:00 ERROR WSOSTSLXBMS01/GSC/3 P1280/T120 > SPACE/Curator-LeaderSelector-0/LeaderSelector at > org.apache.curator.framework.recipes.leader.LeaderSelector$2.call(LeaderSelector.java:246) > 24.01.2018 11:32:44,991 +01:00 ERROR WSOSTSLXBMS01/GSC/3 P1280/T120 > SPACE/Curator-LeaderSelector-0/LeaderSelector at > org.apache.curator.framework.recipes.leader.LeaderSelector$2.call(LeaderSelector.java:240) > 24.01.2018 11:32:44,991 +01:00 ERROR WSOSTSLXBMS01/GSC/3 P1280/T120 > SPACE/Curator-LeaderSelector-0/LeaderSelector at > java.util.concurrent.FutureTask.run(FutureTask.java:266) > 24.01.2018 11:32:44,991 +01:00 ERROR WSOSTSLXBMS01/GSC/3 P1280/T120 > SPACE/Curator-LeaderSelector-0/LeaderSelector at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > 24.01.2018 11:32:44,991 +01:00 ERROR WSOSTSLXBMS01/GSC/3 P1280/T120 > SPACE/Curator-LeaderSelector-0/LeaderSelector at > java.util.concurrent.FutureTask.run(FutureTask.java:266) > 24.01.2018 11:32:44,991 +01:00 ERROR WSOSTSLXBMS01/GSC/3 P1280/T120 > SPACE/Curator-LeaderSelector-0/LeaderSelector at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > 24.01.2018 11:32:44,991 +01:00 ERROR WSOSTSLXBMS01/GSC/3 P1280/T120 > SPACE/Curator-LeaderSelector-0/LeaderSelector at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > 24.01.2018 11:32:44,991 +01:00 ERROR WSOSTSLXBMS01/GSC/3 P1280/T120 > SPACE/Curator-LeaderSelector-0/LeaderSelector at > java.lang.Thread.run(Thread.java:748) > 24.01.2018 11:32:44,991 +01:00 ERROR WSOSTSLXBMS01/GSC/3 P1280/T120 > SPACE/Curator-LeaderSelector-0/LeaderSelector > 24.01.2018 11:32:44,991 +01:00 INFO WSOSTSLXBMS01/GSC/3 P1280/T97 > SPACE/GS-LRMI Connection-pool-1-thread-1-EventThread/ClientCnxn EventThread > shut down for session: 0x36128253b820001 > 24.01.2018 11:32:44,997 +01:00 INFO WSOSTSLXBMS01/GSC/3 P1280/T24 > SPACE/GS-LRMI Connection-pool-1-thread-1/frqMain1_1 Channel is closed > [target=frqMain1_1, target > url=jini://*/frqMain_container1_1/frqMain?groups=POLBAY&ignoreValidation=true&schema=persistent&id=1&total_members=4,1&versioned=true&mirror=true&cluster_schema=partitioned&locators=WSOSTSLXMSS01,WSOSTSLXMSS02&state=started&timeout=5000, > target machine connection > url=NIO://WSOSTSLXBMS02:11002/pid[3756]/24795468496_3_2538032931939568226_details[class > > com.gigaspaces.internal.cluster.node.impl.router.AbstractConnectionProxyBasedReplicationRouter$ConnectionEndpoint(frqMain_container1_1:frqMain)]] > 24.01.2018 11:32:44,998 +01:00 INFO WSOSTSLXBMS01/GSC/3 P1280/T24 > SPACE/GS-LRMI Connection-pool-1-thread-1/ps-main-mirror-service Channel is > closed [target=ps-main-mirror-service, target > url=jini://*/ps-main-mirror-service_container/ps-main-mirror-service?schema=persistent&id=1&total_members=4,1&versioned=true&mirror=true&groups=POLBAY&cluster_schema=partitioned&locators=WSOSTSLXMSS01,WSOSTSLXMSS02&state=started&timeout=5000, > target machine connection > url=NIO://WSOSTSLXBMS01:11002/pid[3220]/25906827215_3_7196393537085178059_details[class > > com.gigaspaces.internal.cluster.node.impl.router.AbstractConnectionProxyBasedReplicationRouter$ConnectionEndpoint(ps-main-mirror-service_container:ps-main-mirror-service)]] > 24.01.2018 11:32:45,767 +01:00 INFO WSOSTSLXBMS01/GSC/3 P1280/T24 > SPACE/GS-LRMI Connection-pool-1-thread-1/1 Shutdown complete > 24.01.2018 11:32:45,768 +01:00 INFO WSOSTSLXBMS01/GSC/3 P1280/T24 > SPACE/GS-LRMI Connection-pool-1-thread-1/container Container > <frqMain_container1> shutdown completed > > 24.01.2018 11:32:46,037 +01:00 WARN WSOSTSLXBMS01/GSC/3 P1280/T24 > SPACE/GS-LRMI Connection-pool-1-thread-1/PUServiceBeanImpl Failed to delete > deployed processing unit from [C:\Program Files\Frequentis\Frequentis LifeX > Platform\gigaspaces\work\processing-units\ps-main_1_2008048111] > 24.01.2018 11:32:46,037 +01:00 INFO WSOSTSLXBMS01/GSC/3 P1280/T24 > SPACE/GS-LRMI Connection-pool-1-thread-1/PUServiceBeanImpl Stopped > > Attached thread dump. Failover happend around 24.01.2018 11:40:20,893 +01:00 > WARN WSOSTSLXBMS01/GSC/3 P1280/T59 SPACE/GS-Watchdog/watchdog The > ServerEndPoint [WSOSTSLXMSS02/10.17.46.142:11001<http://10.17.46.142:11001/>] > is not reachable or is reachable but with no matching invocation in progress > at the server peer, closing invalid connection with local address > [/10.17.46.136:49864<http://10.17.46.136:49864/>] [ remote invocation of: > net.jini.lookup.DiscoveryAdmin.getMemberGroups] > [java.net<http://java.net>.SocketTimeoutException] > > > Could it be ZK bug? > > Thanks, > > Yuval > >
