Re: All brokers are running but some partitions' leader is -1

Gwen Shapira Wed, 25 Nov 2015 12:04:35 -0800

1. Yes, you can do a rolling upgrade of brokers from 0.8.2 to 0.9.0. The
important thing is to upgrade the brokers before you upgrade any of the
clients.


2. I'm not aware of issues with 0.9.0 and SparkStreaming. However,
definitely do your own testing to make sure.

On Wed, Nov 25, 2015 at 11:25 AM, Qi Xu <shkir...@gmail.com> wrote:

> Hi Gwen,
> Yes, we're going to upgrade the 0.9.0 version. Regarding the upgrade, we
> definitely don't want to have down time of our cluster.
> So the upgrade will be machine by machine. Will the release 0.9.0 work with
> the Aug's version together in the same Kafka cluster?
> Also we currently run spark streaming job (with scala 2.10) against the
> cluster. Any known issues of 0.9.0 are you aware of under this scenario?
>
> Thanks,
> Tony
>
>
> On Mon, Nov 23, 2015 at 5:41 PM, Gwen Shapira <g...@confluent.io> wrote:
>
> > We fixed many many bugs since August. Since we are about to release 0.9.0
> > (with SSL!), maybe wait a day and go with a released and tested version.
> >
> > On Mon, Nov 23, 2015 at 3:01 PM, Qi Xu <shkir...@gmail.com> wrote:
> >
> > > Forgot to mention is that the Kafka version we're using is from Aug's
> > > Trunk branch---which has the SSL support.
> > >
> > > Thanks again,
> > > Qi
> > >
> > >
> > > On Mon, Nov 23, 2015 at 2:29 PM, Qi Xu <shkir...@gmail.com> wrote:
> > >
> > >> Loop another guy from our team.
> > >>
> > >> On Mon, Nov 23, 2015 at 2:26 PM, Qi Xu <shkir...@gmail.com> wrote:
> > >>
> > >>> Hi folks,
> > >>> We have a 10 node cluster and have several topics. Each topic has
> about
> > >>> 256 partitions with 3 replica factor. Now we run into an issue that
> in
> > some
> > >>> topic, a few partition (< 10)'s leader is -1 and all of them has only
> > one
> > >>> synced partition.
> > >>>
> > >>> From the Kafka manager, here's the snapshot:
> > >>> [image: Inline image 2]
> > >>>
> > >>> [image: Inline image 1]
> > >>>
> > >>> here's the state log:
> > >>> [2015-11-23 21:57:58,598] ERROR Controller 1 epoch 435499 initiated
> > >>> state change for partition [userlogs,84] from OnlinePartition to
> > >>> OnlinePartition failed (state.change.logger)
> > >>> kafka.common.StateChangeFailedException: encountered error while
> > >>> electing leader for partition [userlogs,84] due to: Preferred replica
> > 0 for
> > >>> partition [userlogs,84] is either not alive or not in the isr.
> Current
> > >>> leader and ISR: [{"leader":-1,"leader_epoch":203,"isr":[1]}].
> > >>> Caused by: kafka.common.StateChangeFailedException: Preferred
> replica 0
> > >>> for partition [userlogs,84] is either not alive or not in the isr.
> > Current
> > >>> leader and ISR: [{"leader":-1,"leader_epoch":203,"isr":[1]}]
> > >>>
> > >>> My question is:
> > >>> 1) how could this happen and how can I fix it or work around it?
> > >>> 2) Is 256 partitions too big? We have about 200+ cores for spark
> > >>> streaming job.
> > >>>
> > >>> Thanks,
> > >>> Qi
> > >>>
> > >>>
> > >>
> > >
> >
>

Re: All brokers are running but some partitions' leader is -1

Reply via email to