Re: Controlled shutdown and leader election issues

Ryan Berdeen Mon, 07 Apr 2014 13:51:12 -0700

I think I've figured it out, and it still happens in the 0.8.1 branch. The
code that is responsible for deleting the key from ZooKeeper is broken and
will never be called when using the command line tool, so it will fail
after the first use. I''ve created
https://issues.apache.org/jira/browse/KAFKA-1365.



On Fri, Apr 4, 2014 at 2:13 AM, Clark Breyman <[email protected]> wrote:

> Done. https://issues.apache.org/jira/browse/KAFKA-1360
>
>
> On Thu, Apr 3, 2014 at 9:13 PM, Neha Narkhede <[email protected]
> >wrote:
>
> > >> Is there a maven repo for pulling snapshot CI builds from?
> >
> > We still need to get the CI build setup going, could you please file a
> JIRA
> > for this?
> > Meanwhile, you will have to just build the code yourself for now,
> > unfortunately.
> >
> > Thanks,
> > Neha
> >
> >
> > On Thu, Apr 3, 2014 at 12:01 PM, Clark Breyman <[email protected]>
> wrote:
> >
> > > Thank Neha - Is there a maven repo for pulling snapshot CI builds from?
> > > Sorry if this is answered elsewhere.
> > >
> > >
> > > On Wed, Apr 2, 2014 at 7:16 PM, Neha Narkhede <[email protected]
> > > >wrote:
> > >
> > > > I'm not so sure if I know the issue you are running into but we
> fixed a
> > > few
> > > > bugs with similar symptoms and the fixes are on the 0.8.1 branch. It
> > will
> > > > be great if you give it a try to see if your issue is resolved.
> > > >
> > > > Thanks,
> > > > Neha
> > > >
> > > >
> > > > On Wed, Apr 2, 2014 at 12:59 PM, Clark Breyman <[email protected]>
> > > wrote:
> > > >
> > > > > Was there an answer for 0.8.1 getting stuck in preferred leader
> > > election?
> > > > > I'm seeing this as well. Is there a JIRA ticket on this issue?
> > > > >
> > > > >
> > > > > On Fri, Mar 21, 2014 at 1:15 PM, Ryan Berdeen <
> [email protected]>
> > > > > wrote:
> > > > >
> > > > > > So, for 0.8 without "controlled.shutdown.enable", why would
> > > > > ShutdownBroker
> > > > > > and restarting cause under-replication and producer exceptions?
> How
> > > > can I
> > > > > > upgrade gracefully?
> > > > > >
> > > > > > What's up with 0.8.1 getting stuck in preferred leader election?
> > > > > >
> > > > > >
> > > > > > On Fri, Mar 21, 2014 at 12:18 AM, Neha Narkhede <
> > > > [email protected]
> > > > > > >wrote:
> > > > > >
> > > > > > > Which brings up the question - Do we need ShutdownBroker
> anymore?
> > > It
> > > > > > seems
> > > > > > > like the config should handle controlled shutdown correctly
> > anyway.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Neha
> > > > > > >
> > > > > > >
> > > > > > > On Thu, Mar 20, 2014 at 9:16 PM, Jun Rao <[email protected]>
> > wrote:
> > > > > > >
> > > > > > > > We haven't been testing the ShutdownBroker command in 0.8.1
> > > > > rigorously
> > > > > > > > since in 0.8.1, one can do the controlled shutdown through
> the
> > > new
> > > > > > config
> > > > > > > > "controlled.shutdown.enable". Instead of running the
> > > ShutdownBroker
> > > > > > > command
> > > > > > > > during the upgrade, you can also wait until under replicated
> > > > > partition
> > > > > > > > count drops to 0 after each restart before moving to the next
> > > one.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > >
> > > > > > > > Jun
> > > > > > > >
> > > > > > > >
> > > > > > > > On Thu, Mar 20, 2014 at 3:14 PM, Ryan Berdeen <
> > > > [email protected]>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > While upgrading from 0.8.0 to 0.8.1 in place, I observed
> some
> > > > > > > surprising
> > > > > > > > > behavior using kafka.admin.ShutdownBroker. At the start,
> > there
> > > > were
> > > > > > no
> > > > > > > > > underreplicated partitions. After running
> > > > > > > > >
> > > > > > > > >   bin/kafka-run-class.sh kafka.admin.ShutdownBroker
> --broker
> > 10
> > > > ...
> > > > > > > > >
> > > > > > > > > Partitions that had replicas on broker 10 were
> > > under-replicated:
> > > > > > > > >
> > > > > > > > >   bin/kafka-topics.sh --describe
> > --under-replicated-partitions
> > > > ...
> > > > > > > > >   Topic: analytics-activity Partition: 2  Leader: 12
> >  Replicas:
> > > > > 12,10
> > > > > > > > Isr:
> > > > > > > > > 12
> > > > > > > > >   Topic: analytics-activity Partition: 6  Leader: 11
> >  Replicas:
> > > > > 11,10
> > > > > > > > Isr:
> > > > > > > > > 11
> > > > > > > > >   Topic: analytics-activity Partition: 14 Leader: 14
> >  Replicas:
> > > > > 14,10
> > > > > > > > Isr:
> > > > > > > > > 14
> > > > > > > > >   ...
> > > > > > > > >
> > > > > > > > > While restarting the broker process, many produce requests
> > > failed
> > > > > > with
> > > > > > > > > kafka.common.UnknownTopicOrPartitionException.
> > > > > > > > >
> > > > > > > > > After each broker restart, I used the preferred leader
> > election
> > > > > tool
> > > > > > > for
> > > > > > > > > all topics. Now, after finishing all of the broker
> restarts,
> > > the
> > > > > > > cluster
> > > > > > > > > seems to be stuck in leader election. Running the tool
> fails
> > > with
> > > > > > > > > "kafka.admin.AdminOperationException: Preferred replica
> > leader
> > > > > > election
> > > > > > > > > currently in progress..."
> > > > > > > > >
> > > > > > > > > Are any of these known issues? Is there a safer way to
> > shutdown
> > > > and
> > > > > > > > restart
> > > > > > > > > brokers that does not cause producer failures and
> > > > under-replicated
> > > > > > > > > partitions?
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Controlled shutdown and leader election issues

Reply via email to