Re: Unavailable partitions after upgrade to kafka 1.0.0

Mika Linnanoja Mon, 23 Apr 2018 00:42:47 -0700

Hi,

On Mon, Apr 23, 2018 at 10:25 AM, Brett Rann <br...@zendesk.com.invalid>
wrote:


> Firstly, 1.0.1 is out and I'd strongly advise you to use that as the
> upgrade path over 1.0.0 if you can because it contains a lot of bugfixes.
> Some critical.
>

Yeah, it would've just meant starting the whole process from scratch in all
of our clusters. We had several other clusters on 1.0 in production and
everything was working A-OK with lighter workloads though, so didn't
consider further versions really.

Luckily we have not hit e.g. that file descriptor bug some of our devs were
worried about for 1.0 (https://issues.apache.org/jira/browse/KAFKA-6529).


> With unclean leader elections it should have resolved itself when the
> affected broker came back online and all partitions were available. So
> probably there was an issue there.
>

We moved to new (mostly default) config file that comes with 1.0, so no
unclean elections enabled by default sadly.

As mentioned enabling it for the affected topics fixed this issue straight
away, but took a while to understand what is going on hence some data loss.
Random googling to the rescue, I'm first to admit being no kind of kafka
expert to be honest.

Personally I had a lot of struggles upgrading off of 0.10 with bugged large
> consumer offset partitions (10s and 100s of GBs) that had stopped
> compacting and should have been in the MBs. The largest ones took 45
> minutes to compact which spread out the rolling upgrade time significantly.
> Also occasionally even with a clean shutdown there was corruption detected
> on broker start and it took time for the repair -- a /lot/ of time. In both
> cases it was easily seen in the logs, and significantly increased disk IO
> metrics on boot (and metrics for FD use gradually returning to previous
> levels).
>

Good to know. I didn't see anything odd before/during/after rolling upgrade
on usual instance level metrics.

Was it all with the one broker, or across multiple?  Did you follow the
> rolling upgrade procedure? At what point in the rolling process did the
> first issue appear?
>
> https://kafka.apache.org/10/documentation/#upgrade  (that's for 1.0.x)
>

We have the softwares installed via puppet, so it is not exactly according
to official guide, but I naturally read those first.

Mostly updating version variable in our puppet config file (masterless) and
applying manually per instance. It works surprisingly well this way.

We just got rid of one Ancient 0.7 kafka cluster, so overall I'm very happy
with the newer versions, GJ all contributors.

Mika

Re: Unavailable partitions after upgrade to kafka 1.0.0

Reply via email to