How can we get this topic forward again? It seems the mail thread is currently 
stalled for no obvious reason. Anyone willing to help us out here again?

Best regards
Stefan

-----Original Message-----
From: Martin Gainty [mailto:mgai...@hotmail.com] 
Sent: Sonntag, 25. März 2018 22:58
To: user@zookeeper.apache.org
Subject: Re: IllegalStateException: instance must be started before calling 
this method



in docker  container?

if so how do you resolve hosts (.hosts file, kubernetes, swarm)?

M
________________________________
From: Yuval Dori <yuval.d...@gigaspaces.com>
Sent: Sunday, March 25, 2018 5:34 AM
To: user@zookeeper.apache.org
Cc: Yuval Dori
Subject: IllegalStateException: instance must be started before calling this 
method

Hi,

Our customer is using ZK 3.4.3 and got IllegalStateException: instance must be 
started before calling this method


with the following stack trace:
2018-02-15T09:58:12.094+0100; ERROR; WSOSTSLXWIT01/MANAGER; P3424/T194; 
[SPACE/LearnerHandler-/10.17.46.142:49336/LearnerHandler<http://10.17.46.142:49336/LearnerHandler>];
 Unexpected exception causing shutdown while sock still open
java.net<http://java.net>.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:171)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
at java.io.DataInputStream.readInt(DataInputStream.java:387)
at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
at 
org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:99)
at 
org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:542)


This is the description:

At first ps-main.1 is started as primary on WSOSTSLXBMS01 GSC 3.

24.01.2018 11:31:04,947 +01:00 INFO WSOSTSLXBMS01/GSC/3 P1280/T120 
SPACE/Curator-LeaderSelector-0/1 Space instance is Primary

After some relocation of ps-main instances a new backup instance is started on 
the same GSC.

24.01.2018 11:32:59,886 +01:00 INFO WSOSTSLXBMS01/GSC/3 P1280/T146 
SPACE/GS-LRMI Connection-pool-1-thread-6/1 Space instance [ 
frqMain_container1:frqMain ] has been selected as Backup

Later a failover test is performed by shutting down WSOSTSLXBMS02 and 
WSOSTSLXMSS02.
We would expect that ps-main.1 backup will be elected as primary because the 
former primary was located at WSOSTSLXBMS02 but no election happend. In the end 
in the whole cluster there was no primary of ps-main.1 running.

We found suspicious exception in the logs:
24.01.2018 11:32:44,987 +01:00 INFO WSOSTSLXBMS01/GSC/3 P1280/T120 
SPACE/Curator-LeaderSelector-0/frqMain_container1:frqMain Lost leadership 
[sessionId=0x36128253b820001]
24.01.2018 11:32:44,987 +01:00 INFO WSOSTSLXBMS01/GSC/3 P1280/T98 
SPACE/Curator-Framework-0/CuratorFrameworkImpl backgroundOperationsLoop exiting
24.01.2018 11:32:44,990 +01:00 INFO WSOSTSLXBMS01/GSC/3 P1280/T120 
SPACE/Curator-LeaderSelector-0/1 Quiesce state set to SUSPENDED
24.01.2018 11:32:44,991 +01:00 INFO WSOSTSLXBMS01/GSC/3 P1280/T24 SPACE/GS-LRMI 
Connection-pool-1-thread-1/ZooKeeper Session: 0x36128253b820001 closed
24.01.2018 11:32:44,991 +01:00 ERROR WSOSTSLXBMS01/GSC/3 P1280/T120 
SPACE/Curator-LeaderSelector-0/LeaderSelector The leader threw an exception
24.01.2018 11:32:44,991 +01:00 ERROR WSOSTSLXBMS01/GSC/3 P1280/T120 
SPACE/Curator-LeaderSelector-0/LeaderSelector java.lang.IllegalStateException: 
instance must be started before calling this method
24.01.2018 11:32:44,991 +01:00 ERROR WSOSTSLXBMS01/GSC/3 P1280/T120 
SPACE/Curator-LeaderSelector-0/LeaderSelector at 
org.apache.curator.shaded.com<http://org.apache.curator.shaded.com>.google.common.base.Preconditions.checkState(Preconditions.java:176)
24.01.2018 11:32:44,991 +01:00 ERROR WSOSTSLXBMS01/GSC/3 P1280/T120 
SPACE/Curator-LeaderSelector-0/LeaderSelector at 
org.apache.curator.framework.imps.CuratorFrameworkImpl.delete(CuratorFrameworkImpl.java:359)
24.01.2018 11:32:44,991 +01:00 ERROR WSOSTSLXBMS01/GSC/3 P1280/T120 
SPACE/Curator-LeaderSelector-0/LeaderSelector at 
org.apache.curator.framework.recipes.locks.LockInternals.deleteOurPath(LockInternals.java:339)
24.01.2018 11:32:44,991 +01:00 ERROR WSOSTSLXBMS01/GSC/3 P1280/T120 
SPACE/Curator-LeaderSelector-0/LeaderSelector at 
org.apache.curator.framework.recipes.locks.LockInternals.releaseLock(LockInternals.java:123)
24.01.2018 11:32:44,991 +01:00 ERROR WSOSTSLXBMS01/GSC/3 P1280/T120 
SPACE/Curator-LeaderSelector-0/LeaderSelector at 
org.apache.curator.framework.recipes.locks.InterProcessMutex.release(InterProcessMutex.java:154)
24.01.2018 11:32:44,991 +01:00 ERROR WSOSTSLXBMS01/GSC/3 P1280/T120 
SPACE/Curator-LeaderSelector-0/LeaderSelector at 
org.apache.curator.framework.recipes.leader.LeaderSelector.doWork(LeaderSelector.java:449)
24.01.2018 11:32:44,991 +01:00 ERROR WSOSTSLXBMS01/GSC/3 P1280/T120 
SPACE/Curator-LeaderSelector-0/LeaderSelector at 
org.apache.curator.framework.recipes.leader.LeaderSelector.doWorkLoop(LeaderSelector.java:466)
24.01.2018 11:32:44,991 +01:00 ERROR WSOSTSLXBMS01/GSC/3 P1280/T120 
SPACE/Curator-LeaderSelector-0/LeaderSelector at 
org.apache.curator.framework.recipes.leader.LeaderSelector.access$100(LeaderSelector.java:65)
24.01.2018 11:32:44,991 +01:00 ERROR WSOSTSLXBMS01/GSC/3 P1280/T120 
SPACE/Curator-LeaderSelector-0/LeaderSelector at 
org.apache.curator.framework.recipes.leader.LeaderSelector$2.call(LeaderSelector.java:246)
24.01.2018 11:32:44,991 +01:00 ERROR WSOSTSLXBMS01/GSC/3 P1280/T120 
SPACE/Curator-LeaderSelector-0/LeaderSelector at 
org.apache.curator.framework.recipes.leader.LeaderSelector$2.call(LeaderSelector.java:240)
24.01.2018 11:32:44,991 +01:00 ERROR WSOSTSLXBMS01/GSC/3 P1280/T120 
SPACE/Curator-LeaderSelector-0/LeaderSelector at 
java.util.concurrent.FutureTask.run(FutureTask.java:266)
24.01.2018 11:32:44,991 +01:00 ERROR WSOSTSLXBMS01/GSC/3 P1280/T120 
SPACE/Curator-LeaderSelector-0/LeaderSelector at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
24.01.2018 11:32:44,991 +01:00 ERROR WSOSTSLXBMS01/GSC/3 P1280/T120 
SPACE/Curator-LeaderSelector-0/LeaderSelector at 
java.util.concurrent.FutureTask.run(FutureTask.java:266)
24.01.2018 11:32:44,991 +01:00 ERROR WSOSTSLXBMS01/GSC/3 P1280/T120 
SPACE/Curator-LeaderSelector-0/LeaderSelector at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
24.01.2018 11:32:44,991 +01:00 ERROR WSOSTSLXBMS01/GSC/3 P1280/T120 
SPACE/Curator-LeaderSelector-0/LeaderSelector at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
24.01.2018 11:32:44,991 +01:00 ERROR WSOSTSLXBMS01/GSC/3 P1280/T120 
SPACE/Curator-LeaderSelector-0/LeaderSelector at 
java.lang.Thread.run(Thread.java:748)
24.01.2018 11:32:44,991 +01:00 ERROR WSOSTSLXBMS01/GSC/3 P1280/T120 
SPACE/Curator-LeaderSelector-0/LeaderSelector
24.01.2018 11:32:44,991 +01:00 INFO WSOSTSLXBMS01/GSC/3 P1280/T97 SPACE/GS-LRMI 
Connection-pool-1-thread-1-EventThread/ClientCnxn EventThread shut down for 
session: 0x36128253b820001
24.01.2018 11:32:44,997 +01:00 INFO WSOSTSLXBMS01/GSC/3 P1280/T24 SPACE/GS-LRMI 
Connection-pool-1-thread-1/frqMain1_1 Channel is closed [target=frqMain1_1, 
target 
url=jini://*/frqMain_container1_1/frqMain?groups=POLBAY&ignoreValidation=true&schema=persistent&id=1&total_members=4,1&versioned=true&mirror=true&cluster_schema=partitioned&locators=WSOSTSLXMSS01,WSOSTSLXMSS02&state=started&timeout=5000,
 target machine connection 
url=NIO://WSOSTSLXBMS02:11002/pid[3756]/24795468496_3_2538032931939568226_details[class
 
com.gigaspaces.internal.cluster.node.impl.router.AbstractConnectionProxyBasedReplicationRouter$ConnectionEndpoint(frqMain_container1_1:frqMain)]]
24.01.2018 11:32:44,998 +01:00 INFO WSOSTSLXBMS01/GSC/3 P1280/T24 SPACE/GS-LRMI 
Connection-pool-1-thread-1/ps-main-mirror-service Channel is closed 
[target=ps-main-mirror-service, target 
url=jini://*/ps-main-mirror-service_container/ps-main-mirror-service?schema=persistent&id=1&total_members=4,1&versioned=true&mirror=true&groups=POLBAY&cluster_schema=partitioned&locators=WSOSTSLXMSS01,WSOSTSLXMSS02&state=started&timeout=5000,
 target machine connection 
url=NIO://WSOSTSLXBMS01:11002/pid[3220]/25906827215_3_7196393537085178059_details[class
 
com.gigaspaces.internal.cluster.node.impl.router.AbstractConnectionProxyBasedReplicationRouter$ConnectionEndpoint(ps-main-mirror-service_container:ps-main-mirror-service)]]
24.01.2018 11:32:45,767 +01:00 INFO WSOSTSLXBMS01/GSC/3 P1280/T24 SPACE/GS-LRMI 
Connection-pool-1-thread-1/1 Shutdown complete
24.01.2018 11:32:45,768 +01:00 INFO WSOSTSLXBMS01/GSC/3 P1280/T24 SPACE/GS-LRMI 
Connection-pool-1-thread-1/container Container <frqMain_container1> shutdown 
completed

24.01.2018 11:32:46,037 +01:00 WARN WSOSTSLXBMS01/GSC/3 P1280/T24 SPACE/GS-LRMI 
Connection-pool-1-thread-1/PUServiceBeanImpl Failed to delete deployed 
processing unit from [C:\Program Files\Frequentis\Frequentis LifeX 
Platform\gigaspaces\work\processing-units\ps-main_1_2008048111]
24.01.2018 11:32:46,037 +01:00 INFO WSOSTSLXBMS01/GSC/3 P1280/T24 SPACE/GS-LRMI 
Connection-pool-1-thread-1/PUServiceBeanImpl Stopped

Attached thread dump. Failover happend around 24.01.2018 11:40:20,893 +01:00
WARN WSOSTSLXBMS01/GSC/3 P1280/T59 SPACE/GS-Watchdog/watchdog The 
ServerEndPoint [WSOSTSLXMSS02/10.17.46.142:11001<http://10.17.46.142:11001/>] 
is not reachable or is reachable but with no matching invocation in progress at 
the server peer, closing invalid connection with local address 
[/10.17.46.136:49864<http://10.17.46.136:49864/>] [ remote invocation of: 
net.jini.lookup.DiscoveryAdmin.getMemberGroups] 
[java.net<http://java.net>.SocketTimeoutException]


Could it be ZK bug?

Thanks,

Yuval


Reply via email to