Could you check the controller log to see if broker 2 once has a soft
failure and hence its leadership been migrated to other brokers?


On Thu, Jun 19, 2014 at 6:57 AM, Arjun <ar...@socialtwist.com> wrote:

> Hi,
>
> I have a set up of 3 kafka servers, with a replication factor of 2.
> I have only one topic in this setup as of now.
>
> bin/kafka-list-topic.sh --zookeeper server1:2181,server2:2181,server3:2181
> --topic topic1
> topic: topic1    partition: 0    leader: 1    replicas: 2,1    isr: 1
> topic: topic1    partition: 1    leader: 0    replicas: 0,2    isr: 0
> topic: topic1    partition: 2    leader: 1    replicas: 1,0    isr: 0,1
> topic: topic1    partition: 3    leader: 0    replicas: 2,0    isr: 0
> topic: topic1    partition: 4    leader: 0    replicas: 0,1    isr: 0,1
> topic: topic1    partition: 5    leader: 1    replicas: 1,2    isr: 1
> topic: topic1    partition: 6    leader: 1    replicas: 2,1    isr: 1
> topic: topic1    partition: 7    leader: 0    replicas: 0,2    isr: 0
> topic: topic1    partition: 8    leader: 1    replicas: 1,0    isr: 0,1
> topic: topic1    partition: 9    leader: 0    replicas: 2,0    isr: 0
> topic: topic1    partition: 10    leader: 0    replicas: 0,1    isr: 0,1
> topic: topic1    partition: 11    leader: 1    replicas: 1,2    isr: 1
>
> The Third broker is not in the ISR list. There are no errors in the logs.
> The Thread dump doesn't have any thread with "RepliacaFetcherManager"
> *Thread Dump
> ------------------------------------------------------------
> ------------------------------------------------------------
> ------------------------------
> *2014-06-19 13:27:39
> Full thread dump Java HotSpot(TM) 64-Bit Server VM (20.4-b02 mixed mode):
>
> "RMI TCP Connection(idle)" daemon prio=10 tid=0x00007fccec004800
> nid=0x201f waiting on condition [0x00007fcce540f000]
>    java.lang.Thread.State: TIMED_WAITING (parking)
>         at sun.misc.Unsafe.park(Native Method)
>         - parking to wait for  <0x00000000bc30e6c8> (a
> java.util.concurrent.SynchronousQueue$TransferStack)
>         at java.util.concurrent.locks.LockSupport.parkNanos(
> LockSupport.java:196)
>         at java.util.concurrent.SynchronousQueue$
> TransferStack.awaitFulfill(SynchronousQueue.java:424)
>         at java.util.concurrent.SynchronousQueue$TransferStack.transfer(
> SynchronousQueue.java:323)
>         at java.util.concurrent.SynchronousQueue.poll(
> SynchronousQueue.java:874)
>         at java.util.concurrent.ThreadPoolExecutor.getTask(
> ThreadPoolExecutor.java:945)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:907)
>         at java.lang.Thread.run(Thread.java:662)
>
> "JMX server connection timeout 30" daemon prio=10 tid=0x00007fccf800a800
> nid=0x555 in Object.wait() [0x00007fcce530e000]
>    java.lang.Thread.State: TIMED_WAITING (on object monitor)
>         at java.lang.Object.wait(Native Method)
>         at com.sun.jmx.remote.internal.ServerCommunicatorAdmin$
> Timeout.run(ServerCommunicatorAdmin.java:150)
>         - locked <0x00000000bc39a640> (a [I)
>         at java.lang.Thread.run(Thread.java:662)
>
> "RMI Scheduler(0)" daemon prio=10 tid=0x00007fccf0040000 nid=0x550 waiting
> on condition [0x00007fcce5510000]
>    java.lang.Thread.State: TIMED_WAITING (parking)
>         at sun.misc.Unsafe.park(Native Method)
>         - parking to wait for  <0x00000000bc2e1fe8> (a
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>         at java.util.concurrent.locks.LockSupport.parkNanos(
> LockSupport.java:196)
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer$
> ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2025)
>         at java.util.concurrent.DelayQueue.take(DelayQueue.java:164)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$
> DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:609)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$
> DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:602)
>         at java.util.concurrent.ThreadPoolExecutor.getTask(
> ThreadPoolExecutor.java:947)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:907)
>         at java.lang.Thread.run(Thread.java:662)
>
> "kafka-logflusher-1" daemon prio=10 tid=0x00007fcd102b9800 nid=0x54d
> waiting on condition [0x00007fcce5813000]
>    java.lang.Thread.State: TIMED_WAITING (parking)
>         at sun.misc.Unsafe.park(Native Method)
> ------------------------------------------------------------
> ------------------------------------------------------------
> -----------------------------------------------
>
> I haven't seen any GC pauses in the system. JMX max lag (
> "kafka.server":name="([-.\w]+)-MaxLag",type="ReplicaFetcherManager") for
> this node is 0.
>
> We have restarted the nodes one after the other and we cant make this node
> to push to ISR.
> Can some one please  let me know, how to push this node to ISR.
>
>
> Thanks
> Arjun Narasimha Kota
>
>
>


-- 
-- Guozhang

Reply via email to