Re: Broker not in ISR.

Arjun Thu, 19 Jun 2014 20:37:29 -0700

Hi

The controller log doesn't say much. It just says the following. Theonly thing I got from the logs is at start things were fine. There aresoem partitions who has the broker 3 as leader. But after that there isno log and nothing is there to see.(sry for the long log, i dont know what may be of important to you whiledissecting the issue, so added what all I can)

2014-05-23 10:30:55,623] INFO [Controller 2]: Broker 2 starting becomecontroller state transition (kafka.controller.KafkaController)[2014-05-23 10:30:55,665] INFO [Controller 2]: Controller 2 incrementedepoch to 2 (kafka.controller.KafkaController)[2014-05-23 10:30:56,584] DEBUG [Channel manager on controller 2]:Controller 2 trying to connect to broker 1(kafka.controller.ControllerChannelManager)[2014-05-23 10:30:56,601] DEBUG [Channel manager on controller 2]:Controller 2 trying to connect to broker 2(kafka.controller.ControllerChannelManager)[2014-05-23 10:30:56,638] INFO [Controller-2-to-broker-2-send-thread],Starting (kafka.controller.RequestSendThread)[2014-05-23 10:30:56,639] INFO [Controller-2-to-broker-1-send-thread],Starting (kafka.controller.RequestSendThread)[2014-05-23 10:30:56,659] INFO [Controller 2]: Currently active brokersin the cluster: Set(1, 2) (kafka.controller.KafkaController)[2014-05-23 10:30:56,694] INFO [Controller 2]: Currently shuttingbrokers in the cluster: Set() (kafka.controller.KafkaController)[2014-05-23 10:30:56,726] INFO [Controller 2]: Current list of topics inthe cluster: Set(topic1) (kafka.controller.KafkaController)[2014-05-23 10:30:56,785] INFO [Replica state machine on controller 2]:Invoking state change to OnlineReplica for replicasPartitionAndReplica(topic1,1,2),PartitionAndReplica(topic1,5,1),PartitionAndReplica(topic1,6,2),PartitionAndReplica(topic1,9,2),PartitionAndReplica(topic1,11,2),PartitionAndReplica(topic1,8,1),PartitionAndReplica(topic1,11,1),PartitionAndReplica(topic1,0,2),PartitionAndReplica(topic1,2,1),PartitionAndReplica(topic1,7,2),PartitionAndReplica(topic1,3,2),PartitionAndReplica(topic1,4,1),PartitionAndReplica(topic1,5,2),PartitionAndReplica(topic1,6,1),PartitionAndReplica(topic1,0,1),PartitionAndReplica(topic1,10,1)(kafka.controller.ReplicaStateMachine)[2014-05-23 10:30:56,953] INFO [Replica state machine on controller 2]:Started replica state machine with initial state -> Map((topic1,10,1) ->OnlineReplica, (topic1,0,1) -> OnlineReplica, (topic1,6,1) ->OnlineReplica, (topic1,2,0) -> OfflineReplica, (topic1,3,0) ->OfflineReplica, (topic1,5,2) -> OnlineReplica, (topic1,4,1) ->OnlineReplica, (topic1,7,2) -> OnlineReplica, (topic1,3,2) ->OnlineReplica, (topic1,8,0) -> OfflineReplica, (topic1,0,2) ->OnlineReplica, (topic1,2,1) -> OnlineReplica, (topic1,1,0) ->OfflineReplica, (topic1,11,1) -> OnlineReplica, (topic1,9,0) ->OfflineReplica, (topic1,8,1) -> OnlineReplica, (topic1,11,2) ->OnlineReplica, (topic1,4,0) -> OfflineReplica, (topic1,9,2) ->OnlineReplica, (topic1,10,0) -> OfflineReplica, (topic1,7,0) ->OfflineReplica, (topic1,6,2) -> OnlineReplica, (topic1,5,1) ->OnlineReplica, (topic1,1,2) -> OnlineReplica)(kafka.controller.ReplicaStateMachine)[2014-05-23 10:30:57,196] DEBUG [OfflinePartitionLeaderSelector]: Somebroker in ISR is alive for [topic1,10]. Select 1 from ISR 1 to be theleader. (kafka.controller.OfflinePartitionLeaderSelector)[2014-05-23 10:30:57,197] INFO [OfflinePartitionLeaderSelector]:Selected new leader and ISR { "ISR":"1", "leader":"1", "leaderEpoch":"1"} for offline partition [topic1,10](kafka.controller.OfflinePartitionLeaderSelector)[2014-05-23 10:30:57,206] DEBUG [Partition state machine on Controller2]: After leader election, leader cache is updated to Map([topic1,0] ->(Leader:2,ISR:2,1,LeaderEpoch:0,ControllerEpoch:1), [topic1,3] ->(Leader:2,ISR:2,0,LeaderEpoch:0,ControllerEpoch:1), [topic1,10] ->(Leader:1,ISR:1,LeaderEpoch:1,ControllerEpoch:2), [topic1,8] ->(Leader:1,ISR:1,0,LeaderEpoch:0,ControllerEpoch:1), [topic1,6] ->(Leader:2,ISR:2,1,LeaderEpoch:0,ControllerEpoch:1), [topic1,2] ->(Leader:1,ISR:1,0,LeaderEpoch:0,ControllerEpoch:1), [topic1,9] ->(Leader:2,ISR:2,0,LeaderEpoch:0,ControllerEpoch:1), [topic1,4] ->(Leader:0,ISR:0,1,LeaderEpoch:0,ControllerEpoch:1), [topic1,5] ->(Leader:1,ISR:1,2,LeaderEpoch:0,ControllerEpoch:1), [topic1,7] ->(Leader:0,ISR:0,2,LeaderEpoch:0,ControllerEpoch:1), [topic1,1] ->(Leader:0,ISR:0,2,LeaderEpoch:0,ControllerEpoch:1), [topic1,11] ->(Leader:1,ISR:1,2,LeaderEpoch:0,ControllerEpoch:1))(kafka.controller.PartitionStateMachine)[2014-05-23 10:30:57,227] DEBUG [OfflinePartitionLeaderSelector]: Somebroker in ISR is alive for [topic1,4]. Select 1 from ISR 1 to be theleader. (kafka.controller.OfflinePartitionLeaderSelector)[2014-05-23 10:30:57,227] INFO [OfflinePartitionLeaderSelector]:Selected new leader and ISR { "ISR":"1", "leader":"1", "leaderEpoch":"1"} for offline partition [topic1,4](kafka.controller.OfflinePartitionLeaderSelector)[2014-05-23 10:30:57,325] DEBUG [Partition state machine on Controller2]: After leader election, leader cache is updated to Map([topic1,0] ->(Leader:2,ISR:2,1,LeaderEpoch:0,ControllerEpoch:1), [topic1,3] ->(Leader:2,ISR:2,0,LeaderEpoch:0,ControllerEpoch:1), [topic1,10] ->(Leader:1,ISR:1,LeaderEpoch:1,ControllerEpoch:2), [topic1,8] ->(Leader:1,ISR:1,0,LeaderEpoch:0,ControllerEpoch:1), [topic1,6] ->(Leader:2,ISR:2,1,LeaderEpoch:0,ControllerEpoch:1), [topic1,2] ->(Leader:1,ISR:1,0,LeaderEpoch:0,ControllerEpoch:1), [topic1,9] ->(Leader:2,ISR:2,0,LeaderEpoch:0,ControllerEpoch:1), [topic1,4] ->(Leader:1,ISR:1,LeaderEpoch:1,ControllerEpoch:2), [topic1,5] ->(Leader:1,ISR:1,2,LeaderEpoch:0,ControllerEpoch:1), [topic1,7] ->(Leader:0,ISR:0,2,LeaderEpoch:0,ControllerEpoch:1), [topic1,1] ->(Leader:0,ISR:0,2,LeaderEpoch:0,ControllerEpoch:1), [topic1,11] ->(Leader:1,ISR:1,2,LeaderEpoch:0,ControllerEpoch:1))(kafka.controller.PartitionStateMachine)[2014-05-23 10:30:57,329] DEBUG [OfflinePartitionLeaderSelector]: Somebroker in ISR is alive for [topic1,7]. Select 2 from ISR 2 to be theleader. (kafka.controller.OfflinePartitionLeaderSelector)[2014-05-23 10:30:57,329] INFO [OfflinePartitionLeaderSelector]:Selected new leader and ISR { "ISR":"2", "leader":"2", "leaderEpoch":"1"} for offline partition [topic1,7](kafka.controller.OfflinePartitionLeaderSelector)[2014-05-23 10:30:57,368] DEBUG [Partition state machine on Controller2]: After leader election, leader cache is updated to Map([topic1,0] ->(Leader:2,ISR:2,1,LeaderEpoch:0,ControllerEpoch:1), [topic1,3] ->(Leader:2,ISR:2,0,LeaderEpoch:0,ControllerEpoch:1), [topic1,10] ->(Leader:1,ISR:1,LeaderEpoch:1,ControllerEpoch:2), [topic1,8] ->(Leader:1,ISR:1,0,LeaderEpoch:0,ControllerEpoch:1), [topic1,6] ->(Leader:2,ISR:2,1,LeaderEpoch:0,ControllerEpoch:1), [topic1,2] ->(Leader:1,ISR:1,0,LeaderEpoch:0,ControllerEpoch:1), [topic1,9] ->(Leader:2,ISR:2,0,LeaderEpoch:0,ControllerEpoch:1), [topic1,4] ->(Leader:1,ISR:1,LeaderEpoch:1,ControllerEpoch:2), [topic1,5] ->(Leader:1,ISR:1,2,LeaderEpoch:0,ControllerEpoch:1), [topic1,7] ->(Leader:2,ISR:2,LeaderEpoch:1,ControllerEpoch:2), [topic1,1] ->(Leader:0,ISR:0,2,LeaderEpoch:0,ControllerEpoch:1), [topic1,11] ->(Leader:1,ISR:1,2,LeaderEpoch:0,ControllerEpoch:1))(kafka.controller.PartitionStateMachine)[2014-05-23 10:30:57,377] DEBUG [OfflinePartitionLeaderSelector]: Somebroker in ISR is alive for [topic1,1]. Select 2 from ISR 2 to be theleader. (kafka.controller.OfflinePartitionLeaderSelector)[2014-05-23 10:30:57,377] INFO [OfflinePartitionLeaderSelector]:Selected new leader and ISR { "ISR":"2", "leader":"2", "leaderEpoch":"1"} for offline partition [topic1,1](kafka.controller.OfflinePartitionLeaderSelector)[2014-05-23 10:30:57,386] DEBUG [Partition state machine on Controller2]: After leader election, leader cache is updated to Map([topic1,0] ->(Leader:2,ISR:2,1,LeaderEpoch:0,ControllerEpoch:1), [topic1,3] ->(Leader:2,ISR:2,0,LeaderEpoch:0,ControllerEpoch:1), [topic1,10] ->(Leader:1,ISR:1,LeaderEpoch:1,ControllerEpoch:2), [topic1,8] ->(Leader:1,ISR:1,0,LeaderEpoch:0,ControllerEpoch:1), [topic1,6] ->(Leader:2,ISR:2,1,LeaderEpoch:0,ControllerEpoch:1), [topic1,2] ->(Leader:1,ISR:1,0,LeaderEpoch:0,ControllerEpoch:1), [topic1,9] ->(Leader:2,ISR:2,0,LeaderEpoch:0,ControllerEpoch:1), [topic1,4] ->(Leader:1,ISR:1,LeaderEpoch:1,ControllerEpoch:2), [topic1,5] ->(Leader:1,ISR:1,2,LeaderEpoch:0,ControllerEpoch:1), [topic1,7] ->(Leader:2,ISR:2,LeaderEpoch:1,ControllerEpoch:2), [topic1,1] ->(Leader:2,ISR:2,LeaderEpoch:1,ControllerEpoch:2), [topic1,11] ->(Leader:1,ISR:1,2,LeaderEpoch:0,ControllerEpoch:1))(kafka.controller.PartitionStateMachine)[2014-05-23 10:30:57,391] INFO [Partition state machine on Controller2]: Started partition state machine with initial state -> Map([topic1,0]-> OnlinePartition, [topic1,3] -> OnlinePartition, [topic1,10] ->OnlinePartition, [topic1,8] -> OnlinePartition, [topic1,6] ->OnlinePartition, [topic1,2] -> OnlinePartition, [topic1,9] ->OnlinePartition, [topic1,4] -> OnlinePartition, [topic1,5] ->OnlinePartition, [topic1,7] -> OnlinePartition, [topic1,1] ->OnlinePartition, [topic1,11] -> OnlinePartition)(kafka.controller.PartitionStateMachine)[2014-05-23 10:30:57,405] INFO [Controller 2]: Broker 2 is ready toserve as the new controller with epoch 2 (kafka.controller.KafkaController)[2014-05-23 10:30:57,408] INFO [Controller 2]: Partitions beingreassigned: Map() (kafka.controller.KafkaController)[2014-05-23 10:30:57,412] INFO [Controller 2]: Partitions alreadyreassigned: List() (kafka.controller.KafkaController)[2014-05-23 10:30:57,414] INFO [Controller 2]: Resuming reassignment ofpartitions: Map() (kafka.controller.KafkaController)[2014-05-23 10:30:57,432] INFO [Controller 2]: Partitions undergoingpreferred replica election: (kafka.controller.KafkaController)[2014-05-23 10:30:57,433] INFO [Controller 2]: Partitions that completedpreferred replica election: (kafka.controller.KafkaController)[2014-05-23 10:30:57,435] INFO [Partition state machine on Controller2]: Invoking state change to OnlinePartition for partitions(kafka.controller.PartitionStateMachine)[2014-05-23 10:30:57,519] DEBUG [ControllerEpochListener on 2]:Controller epoch listener fired with new epoch 2(kafka.controller.ControllerEpochListener)[2014-05-23 10:30:57,520] INFO [ControllerEpochListener on 2]:Initialized controller epoch to 2 and zk version 1(kafka.controller.ControllerEpochListener)[2014-05-23 12:19:38,719] DEBUG preRegister called.Server=com.sun.jmx.mbeanserver.JmxMBeanServer@2dcb25f1,name=log4j:logger=kafka.controller (kafka.controller)[2014-05-23 12:19:48,239] INFO [ControllerEpochListener on 2]:Initialized controller epoch to 5 and zk version 4(kafka.controller.ControllerEpochListener)[2014-05-23 12:19:48,552] INFO [Controller 2]: Controller starting up(kafka.controller.KafkaController)[2014-05-23 12:19:48,910] INFO [Controller 2]: Controller startupcomplete (kafka.controller.KafkaController)[2014-05-23 12:23:33,743] DEBUG preRegister called.Server=com.sun.jmx.mbeanserver.JmxMBeanServer@2dcb25f1,name=log4j:logger=kafka.controller (kafka.controller)[2014-05-23 12:26:59,840] INFO [ControllerEpochListener on 2]:Initialized controller epoch to 5 and zk version 4(kafka.controller.ControllerEpochListener)[2014-05-23 12:27:00,052] INFO [Controller 2]: Controller starting up(kafka.controller.KafkaController)[2014-05-23 12:27:00,351] INFO [Controller 2]: Controller startupcomplete (kafka.controller.KafkaController)


And there is no log after this.

On Thursday 19 June 2014 08:41 PM, Guozhang Wang wrote:

Could you check the controller log to see if broker 2 once has a soft
failure and hence its leadership been migrated to other brokers?


On Thu, Jun 19, 2014 at 6:57 AM, Arjun <ar...@socialtwist.com> wrote:

Hi,

I have a set up of 3 kafka servers, with a replication factor of 2.
I have only one topic in this setup as of now.

bin/kafka-list-topic.sh --zookeeper server1:2181,server2:2181,server3:2181
--topic topic1
topic: topic1    partition: 0    leader: 1    replicas: 2,1    isr: 1
topic: topic1    partition: 1    leader: 0    replicas: 0,2    isr: 0
topic: topic1    partition: 2    leader: 1    replicas: 1,0    isr: 0,1
topic: topic1    partition: 3    leader: 0    replicas: 2,0    isr: 0
topic: topic1    partition: 4    leader: 0    replicas: 0,1    isr: 0,1
topic: topic1    partition: 5    leader: 1    replicas: 1,2    isr: 1
topic: topic1    partition: 6    leader: 1    replicas: 2,1    isr: 1
topic: topic1    partition: 7    leader: 0    replicas: 0,2    isr: 0
topic: topic1    partition: 8    leader: 1    replicas: 1,0    isr: 0,1
topic: topic1    partition: 9    leader: 0    replicas: 2,0    isr: 0
topic: topic1    partition: 10    leader: 0    replicas: 0,1    isr: 0,1
topic: topic1    partition: 11    leader: 1    replicas: 1,2    isr: 1

The Third broker is not in the ISR list. There are no errors in the logs.
The Thread dump doesn't have any thread with "RepliacaFetcherManager"
*Thread Dump
------------------------------------------------------------
------------------------------------------------------------
------------------------------
*2014-06-19 13:27:39
Full thread dump Java HotSpot(TM) 64-Bit Server VM (20.4-b02 mixed mode):

"RMI TCP Connection(idle)" daemon prio=10 tid=0x00007fccec004800
nid=0x201f waiting on condition [0x00007fcce540f000]
    java.lang.Thread.State: TIMED_WAITING (parking)
         at sun.misc.Unsafe.park(Native Method)
         - parking to wait for  <0x00000000bc30e6c8> (a
java.util.concurrent.SynchronousQueue$TransferStack)
         at java.util.concurrent.locks.LockSupport.parkNanos(
LockSupport.java:196)
         at java.util.concurrent.SynchronousQueue$
TransferStack.awaitFulfill(SynchronousQueue.java:424)
         at java.util.concurrent.SynchronousQueue$TransferStack.transfer(
SynchronousQueue.java:323)
         at java.util.concurrent.SynchronousQueue.poll(
SynchronousQueue.java:874)
         at java.util.concurrent.ThreadPoolExecutor.getTask(
ThreadPoolExecutor.java:945)
         at java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:907)
         at java.lang.Thread.run(Thread.java:662)

"JMX server connection timeout 30" daemon prio=10 tid=0x00007fccf800a800
nid=0x555 in Object.wait() [0x00007fcce530e000]
    java.lang.Thread.State: TIMED_WAITING (on object monitor)
         at java.lang.Object.wait(Native Method)
         at com.sun.jmx.remote.internal.ServerCommunicatorAdmin$
Timeout.run(ServerCommunicatorAdmin.java:150)
         - locked <0x00000000bc39a640> (a [I)
         at java.lang.Thread.run(Thread.java:662)

"RMI Scheduler(0)" daemon prio=10 tid=0x00007fccf0040000 nid=0x550 waiting
on condition [0x00007fcce5510000]
    java.lang.Thread.State: TIMED_WAITING (parking)
         at sun.misc.Unsafe.park(Native Method)
         - parking to wait for  <0x00000000bc2e1fe8> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
         at java.util.concurrent.locks.LockSupport.parkNanos(
LockSupport.java:196)
         at java.util.concurrent.locks.AbstractQueuedSynchronizer$
ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2025)
         at java.util.concurrent.DelayQueue.take(DelayQueue.java:164)
         at java.util.concurrent.ScheduledThreadPoolExecutor$
DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:609)
         at java.util.concurrent.ScheduledThreadPoolExecutor$
DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:602)
         at java.util.concurrent.ThreadPoolExecutor.getTask(
ThreadPoolExecutor.java:947)
         at java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:907)
         at java.lang.Thread.run(Thread.java:662)

"kafka-logflusher-1" daemon prio=10 tid=0x00007fcd102b9800 nid=0x54d
waiting on condition [0x00007fcce5813000]
    java.lang.Thread.State: TIMED_WAITING (parking)
         at sun.misc.Unsafe.park(Native Method)
------------------------------------------------------------
------------------------------------------------------------
-----------------------------------------------

I haven't seen any GC pauses in the system. JMX max lag (
"kafka.server":name="([-.\w]+)-MaxLag",type="ReplicaFetcherManager") for
this node is 0.

We have restarted the nodes one after the other and we cant make this node
to push to ISR.
Can some one please  let me know, how to push this node to ISR.


Thanks
Arjun Narasimha Kota

Re: Broker not in ISR.

Reply via email to