Could you check the controller log to see if broker 2 once has a soft failure and hence its leadership been migrated to other brokers?
On Thu, Jun 19, 2014 at 6:57 AM, Arjun <ar...@socialtwist.com> wrote: > Hi, > > I have a set up of 3 kafka servers, with a replication factor of 2. > I have only one topic in this setup as of now. > > bin/kafka-list-topic.sh --zookeeper server1:2181,server2:2181,server3:2181 > --topic topic1 > topic: topic1 partition: 0 leader: 1 replicas: 2,1 isr: 1 > topic: topic1 partition: 1 leader: 0 replicas: 0,2 isr: 0 > topic: topic1 partition: 2 leader: 1 replicas: 1,0 isr: 0,1 > topic: topic1 partition: 3 leader: 0 replicas: 2,0 isr: 0 > topic: topic1 partition: 4 leader: 0 replicas: 0,1 isr: 0,1 > topic: topic1 partition: 5 leader: 1 replicas: 1,2 isr: 1 > topic: topic1 partition: 6 leader: 1 replicas: 2,1 isr: 1 > topic: topic1 partition: 7 leader: 0 replicas: 0,2 isr: 0 > topic: topic1 partition: 8 leader: 1 replicas: 1,0 isr: 0,1 > topic: topic1 partition: 9 leader: 0 replicas: 2,0 isr: 0 > topic: topic1 partition: 10 leader: 0 replicas: 0,1 isr: 0,1 > topic: topic1 partition: 11 leader: 1 replicas: 1,2 isr: 1 > > The Third broker is not in the ISR list. There are no errors in the logs. > The Thread dump doesn't have any thread with "RepliacaFetcherManager" > *Thread Dump > ------------------------------------------------------------ > ------------------------------------------------------------ > ------------------------------ > *2014-06-19 13:27:39 > Full thread dump Java HotSpot(TM) 64-Bit Server VM (20.4-b02 mixed mode): > > "RMI TCP Connection(idle)" daemon prio=10 tid=0x00007fccec004800 > nid=0x201f waiting on condition [0x00007fcce540f000] > java.lang.Thread.State: TIMED_WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00000000bc30e6c8> (a > java.util.concurrent.SynchronousQueue$TransferStack) > at java.util.concurrent.locks.LockSupport.parkNanos( > LockSupport.java:196) > at java.util.concurrent.SynchronousQueue$ > TransferStack.awaitFulfill(SynchronousQueue.java:424) > at java.util.concurrent.SynchronousQueue$TransferStack.transfer( > SynchronousQueue.java:323) > at java.util.concurrent.SynchronousQueue.poll( > SynchronousQueue.java:874) > at java.util.concurrent.ThreadPoolExecutor.getTask( > ThreadPoolExecutor.java:945) > at java.util.concurrent.ThreadPoolExecutor$Worker.run( > ThreadPoolExecutor.java:907) > at java.lang.Thread.run(Thread.java:662) > > "JMX server connection timeout 30" daemon prio=10 tid=0x00007fccf800a800 > nid=0x555 in Object.wait() [0x00007fcce530e000] > java.lang.Thread.State: TIMED_WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > at com.sun.jmx.remote.internal.ServerCommunicatorAdmin$ > Timeout.run(ServerCommunicatorAdmin.java:150) > - locked <0x00000000bc39a640> (a [I) > at java.lang.Thread.run(Thread.java:662) > > "RMI Scheduler(0)" daemon prio=10 tid=0x00007fccf0040000 nid=0x550 waiting > on condition [0x00007fcce5510000] > java.lang.Thread.State: TIMED_WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00000000bc2e1fe8> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.parkNanos( > LockSupport.java:196) > at java.util.concurrent.locks.AbstractQueuedSynchronizer$ > ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2025) > at java.util.concurrent.DelayQueue.take(DelayQueue.java:164) > at java.util.concurrent.ScheduledThreadPoolExecutor$ > DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:609) > at java.util.concurrent.ScheduledThreadPoolExecutor$ > DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:602) > at java.util.concurrent.ThreadPoolExecutor.getTask( > ThreadPoolExecutor.java:947) > at java.util.concurrent.ThreadPoolExecutor$Worker.run( > ThreadPoolExecutor.java:907) > at java.lang.Thread.run(Thread.java:662) > > "kafka-logflusher-1" daemon prio=10 tid=0x00007fcd102b9800 nid=0x54d > waiting on condition [0x00007fcce5813000] > java.lang.Thread.State: TIMED_WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > ------------------------------------------------------------ > ------------------------------------------------------------ > ----------------------------------------------- > > I haven't seen any GC pauses in the system. JMX max lag ( > "kafka.server":name="([-.\w]+)-MaxLag",type="ReplicaFetcherManager") for > this node is 0. > > We have restarted the nodes one after the other and we cant make this node > to push to ISR. > Can some one please let me know, how to push this node to ISR. > > > Thanks > Arjun Narasimha Kota > > > -- -- Guozhang