My state-change.log>>>>>>>>>>>

[2016-05-28 12:33:31,510] ERROR Controller 1008 epoch 13 aborted leader
election for partition [consup-0000000043,78] since the LeaderAndIsr path
was already written by another controll
er. This probably means that the current controller 1008 went through a
soft failure and another controller was elected with epoch 14.
(state.change.logger)
[2016-05-28 12:33:31,510] TRACE Controller 1008 epoch 13 received response
UpdateMetadataResponse(1020,-1) for a request sent to broker id:1009,host:
22.com,port:9092 (state.change.logger)
[2016-05-28 12:33:31,510] TRACE Controller 1008 epoch 13 received response
UpdateMetadataResponse(1066,-1) for a request sent to broker id:1001,host:
consup-kafka20.com,port:9092 (state.change.logger)
[2016-05-28 12:33:31,510] TRACE Controller 1008 epoch 13 received response
UpdateMetadataResponse(777,-1) for a request sent to broker id:1006,host:
consup-kafka11.com,port:9092 (state.change.logger)
[2016-05-28 12:33:31,510] TRACE Controller 1008 epoch 13 received response
UpdateMetadataResponse(742,-1) for a request sent to broker id:1018,host:
consup-kafka10.com,port:9092 (state.change.logger)
[2016-05-28 12:33:31,510] TRACE Controller 1008 epoch 13 received response
UpdateMetadataResponse(1021,-1) for a request sent to broker id:1009,host:
consup-kafka22.com,port:9092 (state.change.logger)
[2016-05-28 12:33:31,511] TRACE Controller 1008 epoch 13 received response
UpdateMetadataResponse(1067,-1) for a request sent to broker id:1001,host:
consup-kafka20.com,port:9092 (state.change.logger)
[2016-05-28 12:33:31,511] TRACE Controller 1008 epoch 13 received response
UpdateMetadataResponse(778,-1) for a request sent to broker id:1006,host:
consup-kafka11.com,port:9092 (state.change.logger)
[2016-05-28 12:33:31,511] TRACE Controller 1008 epoch 13 received response
UpdateMetadataResponse(1022,-1) for a request sent to broker id:1009,host:
consup-kafka22.com,port:9092 (state.change.logger)
[2016-05-28 12:33:31,511] ERROR Controller 1008 epoch 13 encountered error
while electing leader for partition [consup-0000000043,78] due to: aborted
leader election for partition [consup
-0000000043,78] since the LeaderAndIsr path was already written by another
controller. This probably means that the current controller 1008 went
through a soft failure and another con
troller was elected with epoch 14.. (state.change.logger)
[2016-05-28 12:33:31,511] TRACE Controller 1008 epoch 13 received response
UpdateMetadataResponse(743,-1) for a request sent to broker id:1018,host:
consup-kafka10.co
m,port:9092 (state.change.logger)
[2016-05-28 12:33:31,511] ERROR Controller 1008 epoch 13 initiated state
change for partition [consup-0000000043,78] from OnlinePartition to
OnlinePartition failed (state.change.logger)
kafka.common.StateChangeFailedException: encountered error while electing
leader for partition [consup-0000000043,78] due to: aborted leader election
for partition [consup-0000000043,78]
since the LeaderAndIsr path was already written by another controller. This
probably means that the current controller 1008 went through a soft failure
and another controller was elec
ted with epoch 14..
        at
kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:380)
        at
kafka.controller.PartitionStateMachine.kafka$controller$PartitionStateMachine$$handleStateChange(PartitionStateMachine.scala:208)
        at
kafka.controller.PartitionStateMachine$$anonfun$handleStateChanges$2.apply(PartitionStateMachine.scala:146)
        at
kafka.controller.PartitionStateMachine$$anonfun$handleStateChanges$2.apply(PartitionStateMachine.scala:145)
        at scala.collection.immutable.Set$Set1.foreach(Set.scala:79)
        at
kafka.controller.PartitionStateMachine.handleStateChanges(PartitionStateMachine.scala:145)
        at
kafka.controller.KafkaController.onPreferredReplicaElection(KafkaController.scala:631)
        at
kafka.controller.KafkaController$$anonfun$kafka$controller$KafkaController$$checkAndTriggerPartitionRebalance$4$$anonfun$apply$17$$anonfun$apply$5.apply$mcV$sp(KafkaControl
ler.scala:1158)
        at
kafka.controller.KafkaController$$anonfun$kafka$controller$KafkaController$$checkAndTriggerPartitionRebalance$4$$anonfun$apply$17$$anonfun$apply$5.apply(KafkaController.sca
la:1153)
        at
kafka.controller.KafkaController$$anonfun$kafka$controller$KafkaController$$checkAndTriggerPartitionRebalance$4$$anonfun$apply$17$$anonfun$apply$5.apply(KafkaController.sca
la:1153)
        at kafka.utils.Utils$.inLock(Utils.scala:535)
        at
kafka.controller.KafkaController$$anonfun$kafka$controller$KafkaController$$checkAndTriggerPartitionRebalance$4$$anonfun$apply$17.apply(KafkaController.scala:1150)
        at
kafka.controller.KafkaController$$anonfun$kafka$controller$KafkaController$$checkAndTriggerPartitionRebalance$4$$anonfun$apply$17.apply(KafkaController.scala:1148)
        at
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
        at
kafka.controller.KafkaController$$anonfun$kafka$controller$KafkaController$$checkAndTriggerPartitionRebalance$4.apply(KafkaController.scala:1148)
        at
kafka.controller.KafkaController$$anonfun$kafka$controller$KafkaController$$checkAndTriggerPartitionRebalance$4.apply(KafkaController.scala:1127)
        at
scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:221)
        at
scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:428)
        at
kafka.controller.KafkaController.kafka$controller$KafkaController$$checkAndTriggerPartitionRebalance(KafkaController.scala:1127)
        at
kafka.controller.KafkaController$$anonfun$onControllerFailover$1.apply$mcV$sp(KafkaController.scala:326)
        at
kafka.utils.KafkaScheduler$$anonfun$1.apply$mcV$sp(KafkaScheduler.scala:99)
        at kafka.utils.Utils$$anon$1.run(Utils.scala:54)
        at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at
java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
        at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:165)
        at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:267)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.java:679)
Caused by: kafka.common.StateChangeFailedException: aborted leader election
for partition [consup-0000000043,78] since the LeaderAndIsr path was
already written by another controller. This probably means that the current
controller 1008 went through a soft failure and another controller was
elected with epoch 14.
        at
kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:354)
        ... 33 more
[2016-05-28 12:33:31,511] TRACE Controller 1008 epoch 13 received response
UpdateMetadataResponse(744,-1) for a request sent to broker id:1018,host:
consup-kafka10.com,port:9092 (state.change.logger)


2016-05-28 15:31 GMT+08:00 Muqtafi Akhmad <muqt...@traveloka.com>:

> hello Fredo,
>
> Can you elaborate the 'soft' failure?
>
>
>
> On Sat, May 28, 2016 at 1:53 PM, Fredo Lee <buaatianwa...@gmail.com>
> wrote:
>
> > we have a kafka cluster with 19 nodes. every week we  suffer a soft
> failure
> > from this cluster. how to resolve this problem.
> >
>
>
>
> --
> Muqtafi Akhmad
> Software Engineer
> Traveloka
>

Reply via email to