[ 
https://issues.apache.org/jira/browse/RATIS-982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved RATIS-982.
---------------------------------------
    Fix Version/s: 0.6.0
       Resolution: Fixed

> Fix RaftServerImpl illegal transition from RUNNING to RUNNING
> -------------------------------------------------------------
>
>                 Key: RATIS-982
>                 URL: https://issues.apache.org/jira/browse/RATIS-982
>             Project: Ratis
>          Issue Type: Bug
>            Reporter: runzhiwang
>            Assignee: runzhiwang
>            Priority: Major
>             Fix For: 0.6.0
>
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> This happens in test, but it maybe also happen in production.
> For example, leader is s3 and follower is s4.
> 1. kill s4, and restart s4.
> {code:java}
> 2020-06-19 07:03:18,095 [Thread-6194] INFO  ratis.MiniRaftCluster 
> (MiniRaftCluster.java:killServer(458)) - killServer s4
> 2020-06-19 07:03:18,095 [Thread-6194] INFO  ratis.MiniRaftCluster 
> (MiniRaftCluster.java:newRaftServer(330)) - newRaftServer: s4, 
> group-5BD7E8A01610:[s3:0.0.0.0:43375, s4:0.0.0.0:33719, s0:0.0.0.0:34867, 
> s1:0.0.0.0:33783, s2:0.0.0.0:40473], format? false
> {code}
> 2. s4 start and set configuration from storage at 
> [setRaftConf(raftConf.getLogEntryIndex(), raftConf) 
> |https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/ServerState.java#L170]
>  and s4 will change to RUNNING at 
> [lifeCycle.transition(RUNNING)|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerImpl.java#L213]
> {code:java}
> 2020-06-19 07:03:18,127 [pool-16-thread-1] INFO  impl.RaftServerImpl 
> (ServerState.java:setRaftConf(356)) - s4@group-5BD7E8A01610: set 
> configuration 0: [s3:0.0.0.0:43375, s4:0.0.0.0:33719, s0:0.0.0.0:34867, 
> s1:0.0.0.0:33783, s2:0.0.0.0:40473], old=null at 0
> 2020-06-19 07:03:18,153 [Thread-6194] INFO  impl.RaftServerImpl 
> (RaftServerImpl.java:start(185)) - s4@group-5BD7E8A01610: start as a 
> follower, conf=0: [s3:0.0.0.0:43375, s4:0.0.0.0:33719, s0:0.0.0.0:34867, 
> s1:0.0.0.0:33783, s2:0.0.0.0:40473], old=null
> 2020-06-19 07:03:18,153 [Thread-6194] INFO  impl.RaftServerImpl 
> (RaftServerImpl.java:setRole(174)) - s4@group-5BD7E8A01610: changes role from 
>      null to FOLLOWER at term 1 for startAsFollower
> {code}
> 3. s3 send append entry request to s4, and s4 change to RUNNING at 
> [lifeCycle.compareAndTransition(STARTING, 
> RUNNING)|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerImpl.java#L1003]
> {code:java}
> 2020-06-19 07:03:18,162 [nioEventLoopGroup-59-1] DEBUG impl.RaftServerImpl 
> (RaftServerImpl.java:logAppendEntries(918)) - s4@group-5BD7E8A01610: receive 
> appendEntries(s3, 1, (t:1, i:0), 0, false, commits[s3:c0, s4:c0, s0:c0, 
> s1:c0, s2:c0], entries: (t:1, i:1), STATEMACHINELOGENTRY, 
> client-9414EC4E73DA, cid=3000
> {code}
> 4. If change to RUNNING in step3 happens before step2, then step2 will throw 
> exception.
> {code:java}
> 2020-06-19 07:03:18,169 [Thread-6194] INFO  impl.RoleInfo 
> (RoleInfo.java:updateAndGet(143)) - s4: start FollowerState
> 2020-06-19 07:03:18,174 [Thread-6194] ERROR netty.TestRaftWithNetty 
> (ExitUtils.java:terminate(133)) - Terminating with exit status -1: Failed to 
> kill/restart server: s4
> 2020-06-19T07:03:18.1918474Z java.lang.IllegalStateException: ILLEGAL 
> TRANSITION: In s4, RUNNING -> RUNNING
> 2020-06-19T07:03:18.1918899Z  at 
> org.apache.ratis.util.Preconditions.assertTrue(Preconditions.java:63)
> 2020-06-19T07:03:18.1919240Z  at 
> org.apache.ratis.util.LifeCycle$State.validate(LifeCycle.java:115)
> 2020-06-19T07:03:18.1919558Z  at 
> org.apache.ratis.util.LifeCycle.transition(LifeCycle.java:155)
> 2020-06-19T07:03:18.1919878Z  at 
> org.apache.ratis.server.impl.RaftServerImpl.startAsFollower(RaftServerImpl.java:214)
> 2020-06-19T07:03:18.1920206Z  at 
> org.apache.ratis.server.impl.RaftServerImpl.start(RaftServerImpl.java:186)
> 2020-06-19T07:03:18.1920520Z  at 
> java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
> 2020-06-19T07:03:18.1920839Z  at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
> 2020-06-19T07:03:18.1921330Z  at 
> java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
> 2020-06-19T07:03:18.1921639Z  at 
> java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:290)
> 2020-06-19T07:03:18.1921951Z  at 
> java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731)
> 2020-06-19T07:03:18.1922261Z  at 
> java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
> 2020-06-19T07:03:18.1922575Z  at 
> java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:401)
> 2020-06-19T07:03:18.1922885Z  at 
> java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:734)
> 2020-06-19T07:03:18.1925464Z  at 
> java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:159)
> 2020-06-19T07:03:18.1940816Z  at 
> java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:173)
> 2020-06-19T07:03:18.1953283Z  at 
> java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233)
> 2020-06-19T07:03:18.1967610Z  at 
> java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485)
> 2020-06-19T07:03:18.1980549Z  at 
> java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:650)
> 2020-06-19T07:03:18.1991620Z  at 
> org.apache.ratis.server.impl.RaftServerProxy.start(RaftServerProxy.java:301)
> 2020-06-19T07:03:18.1991958Z  at 
> org.apache.ratis.MiniRaftCluster.restartServer(MiniRaftCluster.java:312)
> 2020-06-19T07:03:18.1992275Z  at 
> org.apache.ratis.MiniRaftCluster.restartServer(MiniRaftCluster.java:304)
> 2020-06-19T07:03:18.1992609Z  at 
> org.apache.ratis.RaftBasicTests.lambda$killAndRestartServer$2(RaftBasicTests.java:100)
> 2020-06-19T07:03:18.1992920Z  at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to