We fixed many many bugs since August. Since we are about to release 0.9.0
(with SSL!), maybe wait a day and go with a released and tested version.

On Mon, Nov 23, 2015 at 3:01 PM, Qi Xu <shkir...@gmail.com> wrote:

> Forgot to mention is that the Kafka version we're using is from Aug's
> Trunk branch---which has the SSL support.
>
> Thanks again,
> Qi
>
>
> On Mon, Nov 23, 2015 at 2:29 PM, Qi Xu <shkir...@gmail.com> wrote:
>
>> Loop another guy from our team.
>>
>> On Mon, Nov 23, 2015 at 2:26 PM, Qi Xu <shkir...@gmail.com> wrote:
>>
>>> Hi folks,
>>> We have a 10 node cluster and have several topics. Each topic has about
>>> 256 partitions with 3 replica factor. Now we run into an issue that in some
>>> topic, a few partition (< 10)'s leader is -1 and all of them has only one
>>> synced partition.
>>>
>>> From the Kafka manager, here's the snapshot:
>>> [image: Inline image 2]
>>>
>>> [image: Inline image 1]
>>>
>>> here's the state log:
>>> [2015-11-23 21:57:58,598] ERROR Controller 1 epoch 435499 initiated
>>> state change for partition [userlogs,84] from OnlinePartition to
>>> OnlinePartition failed (state.change.logger)
>>> kafka.common.StateChangeFailedException: encountered error while
>>> electing leader for partition [userlogs,84] due to: Preferred replica 0 for
>>> partition [userlogs,84] is either not alive or not in the isr. Current
>>> leader and ISR: [{"leader":-1,"leader_epoch":203,"isr":[1]}].
>>> Caused by: kafka.common.StateChangeFailedException: Preferred replica 0
>>> for partition [userlogs,84] is either not alive or not in the isr. Current
>>> leader and ISR: [{"leader":-1,"leader_epoch":203,"isr":[1]}]
>>>
>>> My question is:
>>> 1) how could this happen and how can I fix it or work around it?
>>> 2) Is 256 partitions too big? We have about 200+ cores for spark
>>> streaming job.
>>>
>>> Thanks,
>>> Qi
>>>
>>>
>>
>

Reply via email to