I would recommend allowing each broker enough time to catch-up before
starting the next, but this may be less of a concern if the entire cluster
is being brought down and then started from scratch.  To automate, we poll
until the Kafka process binds to its configured port (9092), and then once
all brokers are up we also poll until there are no longer any
under-replicated partitions before performing further maintenance.

Carl

On Fri, Jul 21, 2017 at 10:50 AM, Chris Neal <cwn...@gmail.com> wrote:

> Thanks Carl.
>
> Always fun to do this stuff in production... ;)
>
> Appreciate the input.  I'll try a full cycle and see how that works.
>
> In your opinion, if I stop all brokers and all Zookeeper nodes, then
> restart all Zookeepers...at that point can I start both brokers at the same
> time, or should I let one broker fully start and read all the unflushed
> segments from disk before starting the second broker?
>
> Again, many thanks.
> Chris
>
>
> On Fri, Jul 21, 2017 at 12:13 PM, Carl Haferd <chaf...@groupon.com.invalid
> >
> wrote:
>
> > I have encountered similar difficulties in a test environment and it may
> be
> > necessary to stop the Kafka process on each broker and take Zookeeper
> > offline before removing the files and zookeeper paths.  Otherwise there
> may
> > be a race condition between brokers which could cause the cluster to
> retain
> > information for the topic.
> >
> > Carl
> >
> > On Fri, Jul 21, 2017 at 9:06 AM, Chris Neal <cwn...@gmail.com> wrote:
> >
> > > Welp.  Surprisingly, that did not fix the problem. :(
> > >
> > > I cleaned out all the entries for these topics from /config/topics, and
> > > removed the logs from the file system for those topics, and the
> messages
> > > are still flying by in the server.log file.
> > >
> > > Also, more concerning, when I was looking through the log files for the
> > > other broker in the cluster, I noticed the same type of message for a
> > topic
> > > that should actually be there:
> > >
> > > [2017-07-21 16:03:29,140] ERROR Conditional update of path
> > > /brokers/topics/perf_dstorage_raw/partitions/4/state with data
> > > {"controller_epoch":34,"leader":0,"version":1,"leader_
> > epoch":0,"isr":[0]}
> > > and expected version 0 failed due to
> > > org.apache.zookeeper.KeeperException$BadVersionException:
> > KeeperErrorCode
> > > =
> > > BadVersion for /brokers/topics/perf_dstorage_raw/partitions/4/state
> > > (kafka.utils.ZkUtils$)
> > > [2017-07-21 16:03:29,142] ERROR Conditional update of path
> > > /brokers/topics/perf_dstorage_raw/partitions/0/state with data
> > > {"controller_epoch":34,"leader":0,"version":1,"leader_
> > epoch":0,"isr":[0]}
> > > and expected version 0 failed due to
> > > org.apache.zookeeper.KeeperException$BadVersionException:
> > KeeperErrorCode
> > > =
> > > BadVersion for /brokers/topics/perf_dstorage_raw/partitions/0/state
> > > (kafka.utils.ZkUtils$)
> > > [2017-07-21 16:03:29,142] ERROR Conditional update of path
> > > /brokers/topics/perf_dstorage_raw/partitions/0/state with data
> > > {"controller_epoch":34,"leader":0,"version":1,"leader_
> > epoch":0,"isr":[0]}
> > > and expected version 0 failed due to
> > > org.apache.zookeeper.KeeperException$BadVersionException:
> > KeeperErrorCode
> > > =
> > > BadVersion for /brokers/topics/perf_dstorage_raw/partitions/0/state
> > > (kafka.utils.ZkUtils$)
> > >
> > > So, the issue is not isolated to just these "should-have-been-removed"
> > > topics, unfortunately.
> > >
> > > Really appreciate the input so far everyone.  Still looking though for
> a
> > > solution.  Many thanks. :)
> > >
> > > Chris
> > >
> > > On Fri, Jul 21, 2017 at 10:58 AM, M. Manna <manme...@gmail.com> wrote:
> > >
> > > > Just to add (in case the platoform is Windows)
> > > >
> > > > For Windows based cluster implementation, log/topic cleanup doesn't
> > work
> > > > out of the box. Users are more or less aware of it, and doing their
> own
> > > > maintenance as workaround.
> > > >  If you have issues on Topic deletion not working properly on Windows
> > > (i.e.
> > > > with topic deletion enable and all other settings). then you have to
> > > > manually delete the files.
> > > >
> > > >
> > > > On 21 July 2017 at 16:53, Chris Neal <cwn...@gmail.com> wrote:
> > > >
> > > > > @Carl,
> > > > >
> > > > > There is nothing under /admin/delete_topics other than
> > > > >
> > > > > []
> > > > >
> > > > > And nothing under /admin other than delete_topics :)
> > > > >
> > > > > The topics DO exist, however, under /config/topics!  We may be on
> to
> > > > > something.  I will remove them here and see if that clears it up.
> > > > >
> > > > > Thanks so much for all the help!
> > > > > Chris
> > > > >
> > > > > On Thu, Jul 20, 2017 at 10:37 PM, Chris Neal <cwn...@gmail.com>
> > wrote:
> > > > >
> > > > > > Thanks again for the replies.  VERY much appreciated.  I'll check
> > > both
> > > > > > /admin/delete_topics and /config/topics.
> > > > > >
> > > > > > Chris
> > > > > >
> > > > > > On Thu, Jul 20, 2017 at 9:22 PM, Carl Haferd
> > > > <chaf...@groupon.com.invalid
> > > > > >
> > > > > > wrote:
> > > > > >
> > > > > >> If delete normally works, there would hopefully be some log
> > entries
> > > > when
> > > > > >> it
> > > > > >> fails.  Are there any unusual zookeeper entries in the
> > > > > >> /admin/delete_topics
> > > > > >> path or in the other /admin folders?
> > > > > >>
> > > > > >> Does the topic name still exist in zookeeper under
> /config/topics?
> > > If
> > > > > so,
> > > > > >> that should probably deleted as well.
> > > > > >>
> > > > > >> Carl
> > > > > >>
> > > > > >> On Thu, Jul 20, 2017 at 6:42 PM, Chris Neal <cwn...@gmail.com>
> > > wrote:
> > > > > >>
> > > > > >> > Delete is definitely there.  The delete worked fine, based on
> > the
> > > > fact
> > > > > >> that
> > > > > >> > there is nothing in Zookeeper, and that the controller
> reported
> > > that
> > > > > the
> > > > > >> > delete was successful, it's just something seems to have
> gotten
> > > out
> > > > of
> > > > > >> > sync.
> > > > > >> >
> > > > > >> > delete.topic.enabled is true.  I've successfully deleted
> topics
> > in
> > > > the
> > > > > >> > past, so I know it *should* work. :)
> > > > > >> >
> > > > > >> > I also had already checked in Zookeeper, and there is no
> > directory
> > > > for
> > > > > >> the
> > > > > >> > topics under /brokers/topics....  Very strange indeed.
> > > > > >> >
> > > > > >> > If I just remove the log directories from the filesystem, is
> > that
> > > > > >> enough to
> > > > > >> > get the broker to stop asking about the topics?  I would guess
> > > there
> > > > > >> would
> > > > > >> > need to be more than just that, but I could be wrong.
> > > > > >> >
> > > > > >> > Thanks guys for the suggestions though!
> > > > > >> >
> > > > > >> > On Thu, Jul 20, 2017 at 8:19 PM, Stephen Powis <
> > > > spo...@salesforce.com
> > > > > >
> > > > > >> > wrote:
> > > > > >> >
> > > > > >> > > I could be totally wrong, but I seem to recall that delete
> > > wasn't
> > > > > >> fully
> > > > > >> > > implemented in 0.8.x?
> > > > > >> > >
> > > > > >> > > On Fri, Jul 21, 2017 at 10:10 AM, Carl Haferd
> > > > > >> > <chaf...@groupon.com.invalid
> > > > > >> > > >
> > > > > >> > > wrote:
> > > > > >> > >
> > > > > >> > > > Chris,
> > > > > >> > > >
> > > > > >> > > > You could first check to make sure that
> delete.topic.enable
> > is
> > > > > true
> > > > > >> and
> > > > > >> > > try
> > > > > >> > > > deleting again if not.  If that doesn't work with 0.8.1.1
> > you
> > > > > might
> > > > > >> > need
> > > > > >> > > to
> > > > > >> > > > manually remove the topic's log files from the configured
> > > > log.dirs
> > > > > >> > folder
> > > > > >> > > > on each broker in addition to removing the topic's
> zookeeper
> > > > path.
> > > > > >> > > >
> > > > > >> > > > Carl
> > > > > >> > > >
> > > > > >> > > > On Thu, Jul 20, 2017 at 10:06 AM, Chris Neal <
> > > cwn...@gmail.com>
> > > > > >> wrote:
> > > > > >> > > >
> > > > > >> > > > > Hi all,
> > > > > >> > > > >
> > > > > >> > > > > I have a weird situation here.  I have deleted a few
> > topics
> > > on
> > > > > my
> > > > > >> > > 0.8.1.1
> > > > > >> > > > > cluster (old, I know...).  The deletes succeeded
> according
> > > to
> > > > > the
> > > > > >> > > > > controller.log:
> > > > > >> > > > >
> > > > > >> > > > > [2017-07-20 16:40:31,175] INFO [TopicChangeListener on
> > > > > Controller
> > > > > >> 1]:
> > > > > >> > > New
> > > > > >> > > > > topics: [Set()], deleted topics:
> > > > > >> > > > > [Set(perf_doorway-supplier-adapter-uat_raw)], new
> > partition
> > > > > >> replica
> > > > > >> > > > > assignment [Map()]
> > > > > >> > > > > (kafka.controller.PartitionStateMachine$
> > > TopicChangeListener)
> > > > > >> > > > > [2017-07-20 16:40:33,507] INFO [TopicChangeListener on
> > > > > Controller
> > > > > >> 1]:
> > > > > >> > > New
> > > > > >> > > > > topics: [Set()], deleted topics:
> > > > > >> > > > > [Set(perf_doorway-supplier-scheduler-uat_raw)], new
> > > partition
> > > > > >> > replica
> > > > > >> > > > > assignment [Map()]
> > > > > >> > > > > (kafka.controller.PartitionStateMachine$
> > > TopicChangeListener)
> > > > > >> > > > > [2017-07-20 16:40:36,504] INFO [TopicChangeListener on
> > > > > Controller
> > > > > >> 1]:
> > > > > >> > > New
> > > > > >> > > > > topics: [Set()], deleted topics:
> > > > [Set(perf_gocontent-uat_raw)],
> > > > > >> new
> > > > > >> > > > > partition replica assignment [Map()]
> > > > > >> > > > > (kafka.controller.PartitionStateMachine$
> > > TopicChangeListener)
> > > > > >> > > > > [2017-07-20 16:40:38,290] INFO [TopicChangeListener on
> > > > > Controller
> > > > > >> 1]:
> > > > > >> > > New
> > > > > >> > > > > topics: [Set()], deleted topics:
> > > > [Set(perf_goplatform-uat_raw)]
> > > > > ,
> > > > > >> new
> > > > > >> > > > > partition replica assignment [Map()]
> > > > > >> > > > > (kafka.controller.PartitionStateMachine$
> > > TopicChangeListener)
> > > > > >> > > > >
> > > > > >> > > > > I query Zookeeper and the path is not there under
> > > > > /brokers/topics
> > > > > >> as
> > > > > >> > > > well.
> > > > > >> > > > >
> > > > > >> > > > > But, one of the nodes in my cluster continues to try and
> > use
> > > > > them:
> > > > > >> > > > >
> > > > > >> > > > > [2017-07-20 17:04:36,723] ERROR Conditional update of
> path
> > > > > >> > > > > /brokers/topics/perf_doorway-
> supplier-scheduler-uat_raw/
> > > > > >> > > > partitions/3/state
> > > > > >> > > > > with data
> > > > > >> > > > > {"controller_epoch":34,"leader":1,"version":1,"leader_
> > > > > >> > > > > epoch":2,"isr":[1,0]}
> > > > > >> > > > > and expected version 69 failed due to
> > > > > >> > > > > org.apache.zookeeper.KeeperException$NoNodeException:
> > > > > >> > KeeperErrorCode
> > > > > >> > > =
> > > > > >> > > > > NoNode for
> > > > > >> > > > > /brokers/topics/perf_doorway-
> supplier-scheduler-uat_raw/
> > > > > >> > > > partitions/3/state
> > > > > >> > > > > (kafka.utils.ZkUtils$)
> > > > > >> > > > > [2017-07-20 17:04:36,723] INFO Partition
> > > > > >> > > > > [perf_doorway-supplier-scheduler-uat_raw,3] on broker
> 1:
> > > > Cached
> > > > > >> > > > zkVersion
> > > > > >> > > > > [69] not equal to that in zookeeper, skip updating ISR
> > > > > >> > > > > (kafka.cluster.Partition)
> > > > > >> > > > > [2017-07-20 17:04:36,723] INFO Partition
> > > > > >> > > > > [perf_doorway-supplier-scheduler-uat_raw,3] on broker
> 1:
> > > > Cached
> > > > > >> > > > zkVersion
> > > > > >> > > > > [69] not equal to that in zookeeper, skip updating ISR
> > > > > >> > > > > (kafka.cluster.Partition)
> > > > > >> > > > > [2017-07-20 17:04:36,764] INFO Partition
> > > > > >> [perf_goplatform-uat_raw,2]
> > > > > >> > on
> > > > > >> > > > > broker 1: Shrinking ISR for partition
> > > > > [perf_goplatform-uat_raw,2]
> > > > > >> > from
> > > > > >> > > > 1,0
> > > > > >> > > > > to 1 (kafka.cluster.Partition)
> > > > > >> > > > > [2017-07-20 17:04:36,764] INFO Partition
> > > > > >> [perf_goplatform-uat_raw,2]
> > > > > >> > on
> > > > > >> > > > > broker 1: Shrinking ISR for partition
> > > > > [perf_goplatform-uat_raw,2]
> > > > > >> > from
> > > > > >> > > > 1,0
> > > > > >> > > > > to 1 (kafka.cluster.Partition)
> > > > > >> > > > > [2017-07-20 17:04:36,765] ERROR Conditional update of
> path
> > > > > >> > > > > /brokers/topics/perf_goplatform-uat_raw/partitions/
> > 2/state
> > > > with
> > > > > >> data
> > > > > >> > > > > {"controller_epoch":34,"leader":1,"version":1,"leader_
> > > > > >> > > > epoch":2,"isr":[1]}
> > > > > >> > > > > and expected version 70 failed due to
> > > > > >> > > > > org.apache.zookeeper.KeeperException$NoNodeException:
> > > > > >> > KeeperErrorCode
> > > > > >> > > =
> > > > > >> > > > > NoNode for /brokers/topics/perf_
> > > > goplatform-uat_raw/partitions/
> > > > > >> > 2/state
> > > > > >> > > > > (kafka.utils.ZkUtils$)
> > > > > >> > > > > [2017-07-20 17:04:36,765] ERROR Conditional update of
> path
> > > > > >> > > > > /brokers/topics/perf_goplatform-uat_raw/partitions/
> > 2/state
> > > > with
> > > > > >> data
> > > > > >> > > > > {"controller_epoch":34,"leader":1,"version":1,"leader_
> > > > > >> > > > epoch":2,"isr":[1]}
> > > > > >> > > > > and expected version 70 failed due to
> > > > > >> > > > > org.apache.zookeeper.KeeperException$NoNodeException:
> > > > > >> > KeeperErrorCode
> > > > > >> > > =
> > > > > >> > > > > NoNode for /brokers/topics/perf_
> > > > goplatform-uat_raw/partitions/
> > > > > >> > 2/state
> > > > > >> > > > > (kafka.utils.ZkUtils$)
> > > > > >> > > > > [2017-07-20 17:04:36,765] INFO Partition
> > > > > >> [perf_goplatform-uat_raw,2]
> > > > > >> > on
> > > > > >> > > > > broker 1: Cached zkVersion [70] not equal to that in
> > > > zookeeper,
> > > > > >> skip
> > > > > >> > > > > updating ISR (kafka.cluster.Partition)
> > > > > >> > > > > [2017-07-20 17:04:36,765] INFO Partition
> > > > > >> [perf_goplatform-uat_raw,2]
> > > > > >> > on
> > > > > >> > > > > broker 1: Cached zkVersion [70] not equal to that in
> > > > zookeeper,
> > > > > >> skip
> > > > > >> > > > > updating ISR (kafka.cluster.Partition)
> > > > > >> > > > > [2017-07-20 17:04:36,981] INFO Partition
> > > > > >> [perf_gocontent-uat_raw,1]
> > > > > >> > on
> > > > > >> > > > > broker 1: Shrinking ISR for partition
> > > > [perf_gocontent-uat_raw,1]
> > > > > >> from
> > > > > >> > > 1,0
> > > > > >> > > > > to 1 (kafka.cluster.Partition)
> > > > > >> > > > > [2017-07-20 17:04:36,981] INFO Partition
> > > > > >> [perf_gocontent-uat_raw,1]
> > > > > >> > on
> > > > > >> > > > > broker 1: Shrinking ISR for partition
> > > > [perf_gocontent-uat_raw,1]
> > > > > >> from
> > > > > >> > > 1,0
> > > > > >> > > > > to 1 (kafka.cluster.Partition)
> > > > > >> > > > > [2017-07-20 17:04:36,988] ERROR Conditional update of
> path
> > > > > >> > > > > /brokers/topics/perf_gocontent-uat_raw/partitions/
> 1/state
> > > > with
> > > > > >> data
> > > > > >> > > > > {"controller_epoch":34,"leader":1,"version":1,"leader_
> > > > > >> > > > epoch":4,"isr":[1]}
> > > > > >> > > > > and expected version 90 failed due to
> > > > > >> > > > > org.apache.zookeeper.KeeperException$NoNodeException:
> > > > > >> > KeeperErrorCode
> > > > > >> > > =
> > > > > >> > > > > NoNode for /brokers/topics/perf_gocontent
> > > > > >> -uat_raw/partitions/1/state
> > > > > >> > > > > (kafka.utils.ZkUtils$)
> > > > > >> > > > > [2017-07-20 17:04:36,988] ERROR Conditional update of
> path
> > > > > >> > > > > /brokers/topics/perf_gocontent-uat_raw/partitions/
> 1/state
> > > > with
> > > > > >> data
> > > > > >> > > > > {"controller_epoch":34,"leader":1,"version":1,"leader_
> > > > > >> > > > epoch":4,"isr":[1]}
> > > > > >> > > > > and expected version 90 failed due to
> > > > > >> > > > > org.apache.zookeeper.KeeperException$NoNodeException:
> > > > > >> > KeeperErrorCode
> > > > > >> > > =
> > > > > >> > > > > NoNode for /brokers/topics/perf_gocontent
> > > > > >> -uat_raw/partitions/1/state
> > > > > >> > > > > (kafka.utils.ZkUtils$)
> > > > > >> > > > > [2017-07-20 17:04:36,988] INFO Partition
> > > > > >> [perf_gocontent-uat_raw,1]
> > > > > >> > on
> > > > > >> > > > > broker 1: Cached zkVersion [90] not equal to that in
> > > > zookeeper,
> > > > > >> skip
> > > > > >> > > > > updating ISR (kafka.cluster.Partition)
> > > > > >> > > > > [2017-07-20 17:04:36,988] INFO Partition
> > > > > >> [perf_gocontent-uat_raw,1]
> > > > > >> > on
> > > > > >> > > > > broker 1: Cached zkVersion [90] not equal to that in
> > > > zookeeper,
> > > > > >> skip
> > > > > >> > > > > updating ISR (kafka.cluster.Partition)
> > > > > >> > > > >
> > > > > >> > > > > I've tried a rolling restart of the cluster to see if
> that
> > > > fixed
> > > > > >> it,
> > > > > >> > > but
> > > > > >> > > > it
> > > > > >> > > > > did not.
> > > > > >> > > > >
> > > > > >> > > > > Can someone please help me out here?  I'm not sure how I
> > can
> > > > get
> > > > > >> > things
> > > > > >> > > > > back in sync.
> > > > > >> > > > >
> > > > > >> > > > > Thank you so much for your time.
> > > > > >> > > > > Chris
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to