Gotcha.  Thanks again.  Will post back once I've tried this with an update!

Chris

On Fri, Jul 21, 2017 at 1:12 PM, Carl Haferd <chaf...@groupon.com.invalid>
wrote:

> I would recommend allowing each broker enough time to catch-up before
> starting the next, but this may be less of a concern if the entire cluster
> is being brought down and then started from scratch.  To automate, we poll
> until the Kafka process binds to its configured port (9092), and then once
> all brokers are up we also poll until there are no longer any
> under-replicated partitions before performing further maintenance.
>
> Carl
>
> On Fri, Jul 21, 2017 at 10:50 AM, Chris Neal <cwn...@gmail.com> wrote:
>
> > Thanks Carl.
> >
> > Always fun to do this stuff in production... ;)
> >
> > Appreciate the input.  I'll try a full cycle and see how that works.
> >
> > In your opinion, if I stop all brokers and all Zookeeper nodes, then
> > restart all Zookeepers...at that point can I start both brokers at the
> same
> > time, or should I let one broker fully start and read all the unflushed
> > segments from disk before starting the second broker?
> >
> > Again, many thanks.
> > Chris
> >
> >
> > On Fri, Jul 21, 2017 at 12:13 PM, Carl Haferd
> <chaf...@groupon.com.invalid
> > >
> > wrote:
> >
> > > I have encountered similar difficulties in a test environment and it
> may
> > be
> > > necessary to stop the Kafka process on each broker and take Zookeeper
> > > offline before removing the files and zookeeper paths.  Otherwise there
> > may
> > > be a race condition between brokers which could cause the cluster to
> > retain
> > > information for the topic.
> > >
> > > Carl
> > >
> > > On Fri, Jul 21, 2017 at 9:06 AM, Chris Neal <cwn...@gmail.com> wrote:
> > >
> > > > Welp.  Surprisingly, that did not fix the problem. :(
> > > >
> > > > I cleaned out all the entries for these topics from /config/topics,
> and
> > > > removed the logs from the file system for those topics, and the
> > messages
> > > > are still flying by in the server.log file.
> > > >
> > > > Also, more concerning, when I was looking through the log files for
> the
> > > > other broker in the cluster, I noticed the same type of message for a
> > > topic
> > > > that should actually be there:
> > > >
> > > > [2017-07-21 16:03:29,140] ERROR Conditional update of path
> > > > /brokers/topics/perf_dstorage_raw/partitions/4/state with data
> > > > {"controller_epoch":34,"leader":0,"version":1,"leader_
> > > epoch":0,"isr":[0]}
> > > > and expected version 0 failed due to
> > > > org.apache.zookeeper.KeeperException$BadVersionException:
> > > KeeperErrorCode
> > > > =
> > > > BadVersion for /brokers/topics/perf_dstorage_raw/partitions/4/state
> > > > (kafka.utils.ZkUtils$)
> > > > [2017-07-21 16:03:29,142] ERROR Conditional update of path
> > > > /brokers/topics/perf_dstorage_raw/partitions/0/state with data
> > > > {"controller_epoch":34,"leader":0,"version":1,"leader_
> > > epoch":0,"isr":[0]}
> > > > and expected version 0 failed due to
> > > > org.apache.zookeeper.KeeperException$BadVersionException:
> > > KeeperErrorCode
> > > > =
> > > > BadVersion for /brokers/topics/perf_dstorage_raw/partitions/0/state
> > > > (kafka.utils.ZkUtils$)
> > > > [2017-07-21 16:03:29,142] ERROR Conditional update of path
> > > > /brokers/topics/perf_dstorage_raw/partitions/0/state with data
> > > > {"controller_epoch":34,"leader":0,"version":1,"leader_
> > > epoch":0,"isr":[0]}
> > > > and expected version 0 failed due to
> > > > org.apache.zookeeper.KeeperException$BadVersionException:
> > > KeeperErrorCode
> > > > =
> > > > BadVersion for /brokers/topics/perf_dstorage_raw/partitions/0/state
> > > > (kafka.utils.ZkUtils$)
> > > >
> > > > So, the issue is not isolated to just these
> "should-have-been-removed"
> > > > topics, unfortunately.
> > > >
> > > > Really appreciate the input so far everyone.  Still looking though
> for
> > a
> > > > solution.  Many thanks. :)
> > > >
> > > > Chris
> > > >
> > > > On Fri, Jul 21, 2017 at 10:58 AM, M. Manna <manme...@gmail.com>
> wrote:
> > > >
> > > > > Just to add (in case the platoform is Windows)
> > > > >
> > > > > For Windows based cluster implementation, log/topic cleanup doesn't
> > > work
> > > > > out of the box. Users are more or less aware of it, and doing their
> > own
> > > > > maintenance as workaround.
> > > > >  If you have issues on Topic deletion not working properly on
> Windows
> > > > (i.e.
> > > > > with topic deletion enable and all other settings). then you have
> to
> > > > > manually delete the files.
> > > > >
> > > > >
> > > > > On 21 July 2017 at 16:53, Chris Neal <cwn...@gmail.com> wrote:
> > > > >
> > > > > > @Carl,
> > > > > >
> > > > > > There is nothing under /admin/delete_topics other than
> > > > > >
> > > > > > []
> > > > > >
> > > > > > And nothing under /admin other than delete_topics :)
> > > > > >
> > > > > > The topics DO exist, however, under /config/topics!  We may be on
> > to
> > > > > > something.  I will remove them here and see if that clears it up.
> > > > > >
> > > > > > Thanks so much for all the help!
> > > > > > Chris
> > > > > >
> > > > > > On Thu, Jul 20, 2017 at 10:37 PM, Chris Neal <cwn...@gmail.com>
> > > wrote:
> > > > > >
> > > > > > > Thanks again for the replies.  VERY much appreciated.  I'll
> check
> > > > both
> > > > > > > /admin/delete_topics and /config/topics.
> > > > > > >
> > > > > > > Chris
> > > > > > >
> > > > > > > On Thu, Jul 20, 2017 at 9:22 PM, Carl Haferd
> > > > > <chaf...@groupon.com.invalid
> > > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > >> If delete normally works, there would hopefully be some log
> > > entries
> > > > > when
> > > > > > >> it
> > > > > > >> fails.  Are there any unusual zookeeper entries in the
> > > > > > >> /admin/delete_topics
> > > > > > >> path or in the other /admin folders?
> > > > > > >>
> > > > > > >> Does the topic name still exist in zookeeper under
> > /config/topics?
> > > > If
> > > > > > so,
> > > > > > >> that should probably deleted as well.
> > > > > > >>
> > > > > > >> Carl
> > > > > > >>
> > > > > > >> On Thu, Jul 20, 2017 at 6:42 PM, Chris Neal <cwn...@gmail.com
> >
> > > > wrote:
> > > > > > >>
> > > > > > >> > Delete is definitely there.  The delete worked fine, based
> on
> > > the
> > > > > fact
> > > > > > >> that
> > > > > > >> > there is nothing in Zookeeper, and that the controller
> > reported
> > > > that
> > > > > > the
> > > > > > >> > delete was successful, it's just something seems to have
> > gotten
> > > > out
> > > > > of
> > > > > > >> > sync.
> > > > > > >> >
> > > > > > >> > delete.topic.enabled is true.  I've successfully deleted
> > topics
> > > in
> > > > > the
> > > > > > >> > past, so I know it *should* work. :)
> > > > > > >> >
> > > > > > >> > I also had already checked in Zookeeper, and there is no
> > > directory
> > > > > for
> > > > > > >> the
> > > > > > >> > topics under /brokers/topics....  Very strange indeed.
> > > > > > >> >
> > > > > > >> > If I just remove the log directories from the filesystem, is
> > > that
> > > > > > >> enough to
> > > > > > >> > get the broker to stop asking about the topics?  I would
> guess
> > > > there
> > > > > > >> would
> > > > > > >> > need to be more than just that, but I could be wrong.
> > > > > > >> >
> > > > > > >> > Thanks guys for the suggestions though!
> > > > > > >> >
> > > > > > >> > On Thu, Jul 20, 2017 at 8:19 PM, Stephen Powis <
> > > > > spo...@salesforce.com
> > > > > > >
> > > > > > >> > wrote:
> > > > > > >> >
> > > > > > >> > > I could be totally wrong, but I seem to recall that delete
> > > > wasn't
> > > > > > >> fully
> > > > > > >> > > implemented in 0.8.x?
> > > > > > >> > >
> > > > > > >> > > On Fri, Jul 21, 2017 at 10:10 AM, Carl Haferd
> > > > > > >> > <chaf...@groupon.com.invalid
> > > > > > >> > > >
> > > > > > >> > > wrote:
> > > > > > >> > >
> > > > > > >> > > > Chris,
> > > > > > >> > > >
> > > > > > >> > > > You could first check to make sure that
> > delete.topic.enable
> > > is
> > > > > > true
> > > > > > >> and
> > > > > > >> > > try
> > > > > > >> > > > deleting again if not.  If that doesn't work with
> 0.8.1.1
> > > you
> > > > > > might
> > > > > > >> > need
> > > > > > >> > > to
> > > > > > >> > > > manually remove the topic's log files from the
> configured
> > > > > log.dirs
> > > > > > >> > folder
> > > > > > >> > > > on each broker in addition to removing the topic's
> > zookeeper
> > > > > path.
> > > > > > >> > > >
> > > > > > >> > > > Carl
> > > > > > >> > > >
> > > > > > >> > > > On Thu, Jul 20, 2017 at 10:06 AM, Chris Neal <
> > > > cwn...@gmail.com>
> > > > > > >> wrote:
> > > > > > >> > > >
> > > > > > >> > > > > Hi all,
> > > > > > >> > > > >
> > > > > > >> > > > > I have a weird situation here.  I have deleted a few
> > > topics
> > > > on
> > > > > > my
> > > > > > >> > > 0.8.1.1
> > > > > > >> > > > > cluster (old, I know...).  The deletes succeeded
> > according
> > > > to
> > > > > > the
> > > > > > >> > > > > controller.log:
> > > > > > >> > > > >
> > > > > > >> > > > > [2017-07-20 16:40:31,175] INFO [TopicChangeListener on
> > > > > > Controller
> > > > > > >> 1]:
> > > > > > >> > > New
> > > > > > >> > > > > topics: [Set()], deleted topics:
> > > > > > >> > > > > [Set(perf_doorway-supplier-adapter-uat_raw)], new
> > > partition
> > > > > > >> replica
> > > > > > >> > > > > assignment [Map()]
> > > > > > >> > > > > (kafka.controller.PartitionStateMachine$
> > > > TopicChangeListener)
> > > > > > >> > > > > [2017-07-20 16:40:33,507] INFO [TopicChangeListener on
> > > > > > Controller
> > > > > > >> 1]:
> > > > > > >> > > New
> > > > > > >> > > > > topics: [Set()], deleted topics:
> > > > > > >> > > > > [Set(perf_doorway-supplier-scheduler-uat_raw)], new
> > > > partition
> > > > > > >> > replica
> > > > > > >> > > > > assignment [Map()]
> > > > > > >> > > > > (kafka.controller.PartitionStateMachine$
> > > > TopicChangeListener)
> > > > > > >> > > > > [2017-07-20 16:40:36,504] INFO [TopicChangeListener on
> > > > > > Controller
> > > > > > >> 1]:
> > > > > > >> > > New
> > > > > > >> > > > > topics: [Set()], deleted topics:
> > > > > [Set(perf_gocontent-uat_raw)],
> > > > > > >> new
> > > > > > >> > > > > partition replica assignment [Map()]
> > > > > > >> > > > > (kafka.controller.PartitionStateMachine$
> > > > TopicChangeListener)
> > > > > > >> > > > > [2017-07-20 16:40:38,290] INFO [TopicChangeListener on
> > > > > > Controller
> > > > > > >> 1]:
> > > > > > >> > > New
> > > > > > >> > > > > topics: [Set()], deleted topics:
> > > > > [Set(perf_goplatform-uat_raw)]
> > > > > > ,
> > > > > > >> new
> > > > > > >> > > > > partition replica assignment [Map()]
> > > > > > >> > > > > (kafka.controller.PartitionStateMachine$
> > > > TopicChangeListener)
> > > > > > >> > > > >
> > > > > > >> > > > > I query Zookeeper and the path is not there under
> > > > > > /brokers/topics
> > > > > > >> as
> > > > > > >> > > > well.
> > > > > > >> > > > >
> > > > > > >> > > > > But, one of the nodes in my cluster continues to try
> and
> > > use
> > > > > > them:
> > > > > > >> > > > >
> > > > > > >> > > > > [2017-07-20 17:04:36,723] ERROR Conditional update of
> > path
> > > > > > >> > > > > /brokers/topics/perf_doorway-
> > supplier-scheduler-uat_raw/
> > > > > > >> > > > partitions/3/state
> > > > > > >> > > > > with data
> > > > > > >> > > > > {"controller_epoch":34,"
> leader":1,"version":1,"leader_
> > > > > > >> > > > > epoch":2,"isr":[1,0]}
> > > > > > >> > > > > and expected version 69 failed due to
> > > > > > >> > > > > org.apache.zookeeper.KeeperException$NoNodeException:
> > > > > > >> > KeeperErrorCode
> > > > > > >> > > =
> > > > > > >> > > > > NoNode for
> > > > > > >> > > > > /brokers/topics/perf_doorway-
> > supplier-scheduler-uat_raw/
> > > > > > >> > > > partitions/3/state
> > > > > > >> > > > > (kafka.utils.ZkUtils$)
> > > > > > >> > > > > [2017-07-20 17:04:36,723] INFO Partition
> > > > > > >> > > > > [perf_doorway-supplier-scheduler-uat_raw,3] on broker
> > 1:
> > > > > Cached
> > > > > > >> > > > zkVersion
> > > > > > >> > > > > [69] not equal to that in zookeeper, skip updating ISR
> > > > > > >> > > > > (kafka.cluster.Partition)
> > > > > > >> > > > > [2017-07-20 17:04:36,723] INFO Partition
> > > > > > >> > > > > [perf_doorway-supplier-scheduler-uat_raw,3] on broker
> > 1:
> > > > > Cached
> > > > > > >> > > > zkVersion
> > > > > > >> > > > > [69] not equal to that in zookeeper, skip updating ISR
> > > > > > >> > > > > (kafka.cluster.Partition)
> > > > > > >> > > > > [2017-07-20 17:04:36,764] INFO Partition
> > > > > > >> [perf_goplatform-uat_raw,2]
> > > > > > >> > on
> > > > > > >> > > > > broker 1: Shrinking ISR for partition
> > > > > > [perf_goplatform-uat_raw,2]
> > > > > > >> > from
> > > > > > >> > > > 1,0
> > > > > > >> > > > > to 1 (kafka.cluster.Partition)
> > > > > > >> > > > > [2017-07-20 17:04:36,764] INFO Partition
> > > > > > >> [perf_goplatform-uat_raw,2]
> > > > > > >> > on
> > > > > > >> > > > > broker 1: Shrinking ISR for partition
> > > > > > [perf_goplatform-uat_raw,2]
> > > > > > >> > from
> > > > > > >> > > > 1,0
> > > > > > >> > > > > to 1 (kafka.cluster.Partition)
> > > > > > >> > > > > [2017-07-20 17:04:36,765] ERROR Conditional update of
> > path
> > > > > > >> > > > > /brokers/topics/perf_goplatform-uat_raw/partitions/
> > > 2/state
> > > > > with
> > > > > > >> data
> > > > > > >> > > > > {"controller_epoch":34,"
> leader":1,"version":1,"leader_
> > > > > > >> > > > epoch":2,"isr":[1]}
> > > > > > >> > > > > and expected version 70 failed due to
> > > > > > >> > > > > org.apache.zookeeper.KeeperException$NoNodeException:
> > > > > > >> > KeeperErrorCode
> > > > > > >> > > =
> > > > > > >> > > > > NoNode for /brokers/topics/perf_
> > > > > goplatform-uat_raw/partitions/
> > > > > > >> > 2/state
> > > > > > >> > > > > (kafka.utils.ZkUtils$)
> > > > > > >> > > > > [2017-07-20 17:04:36,765] ERROR Conditional update of
> > path
> > > > > > >> > > > > /brokers/topics/perf_goplatform-uat_raw/partitions/
> > > 2/state
> > > > > with
> > > > > > >> data
> > > > > > >> > > > > {"controller_epoch":34,"
> leader":1,"version":1,"leader_
> > > > > > >> > > > epoch":2,"isr":[1]}
> > > > > > >> > > > > and expected version 70 failed due to
> > > > > > >> > > > > org.apache.zookeeper.KeeperException$NoNodeException:
> > > > > > >> > KeeperErrorCode
> > > > > > >> > > =
> > > > > > >> > > > > NoNode for /brokers/topics/perf_
> > > > > goplatform-uat_raw/partitions/
> > > > > > >> > 2/state
> > > > > > >> > > > > (kafka.utils.ZkUtils$)
> > > > > > >> > > > > [2017-07-20 17:04:36,765] INFO Partition
> > > > > > >> [perf_goplatform-uat_raw,2]
> > > > > > >> > on
> > > > > > >> > > > > broker 1: Cached zkVersion [70] not equal to that in
> > > > > zookeeper,
> > > > > > >> skip
> > > > > > >> > > > > updating ISR (kafka.cluster.Partition)
> > > > > > >> > > > > [2017-07-20 17:04:36,765] INFO Partition
> > > > > > >> [perf_goplatform-uat_raw,2]
> > > > > > >> > on
> > > > > > >> > > > > broker 1: Cached zkVersion [70] not equal to that in
> > > > > zookeeper,
> > > > > > >> skip
> > > > > > >> > > > > updating ISR (kafka.cluster.Partition)
> > > > > > >> > > > > [2017-07-20 17:04:36,981] INFO Partition
> > > > > > >> [perf_gocontent-uat_raw,1]
> > > > > > >> > on
> > > > > > >> > > > > broker 1: Shrinking ISR for partition
> > > > > [perf_gocontent-uat_raw,1]
> > > > > > >> from
> > > > > > >> > > 1,0
> > > > > > >> > > > > to 1 (kafka.cluster.Partition)
> > > > > > >> > > > > [2017-07-20 17:04:36,981] INFO Partition
> > > > > > >> [perf_gocontent-uat_raw,1]
> > > > > > >> > on
> > > > > > >> > > > > broker 1: Shrinking ISR for partition
> > > > > [perf_gocontent-uat_raw,1]
> > > > > > >> from
> > > > > > >> > > 1,0
> > > > > > >> > > > > to 1 (kafka.cluster.Partition)
> > > > > > >> > > > > [2017-07-20 17:04:36,988] ERROR Conditional update of
> > path
> > > > > > >> > > > > /brokers/topics/perf_gocontent-uat_raw/partitions/
> > 1/state
> > > > > with
> > > > > > >> data
> > > > > > >> > > > > {"controller_epoch":34,"
> leader":1,"version":1,"leader_
> > > > > > >> > > > epoch":4,"isr":[1]}
> > > > > > >> > > > > and expected version 90 failed due to
> > > > > > >> > > > > org.apache.zookeeper.KeeperException$NoNodeException:
> > > > > > >> > KeeperErrorCode
> > > > > > >> > > =
> > > > > > >> > > > > NoNode for /brokers/topics/perf_gocontent
> > > > > > >> -uat_raw/partitions/1/state
> > > > > > >> > > > > (kafka.utils.ZkUtils$)
> > > > > > >> > > > > [2017-07-20 17:04:36,988] ERROR Conditional update of
> > path
> > > > > > >> > > > > /brokers/topics/perf_gocontent-uat_raw/partitions/
> > 1/state
> > > > > with
> > > > > > >> data
> > > > > > >> > > > > {"controller_epoch":34,"
> leader":1,"version":1,"leader_
> > > > > > >> > > > epoch":4,"isr":[1]}
> > > > > > >> > > > > and expected version 90 failed due to
> > > > > > >> > > > > org.apache.zookeeper.KeeperException$NoNodeException:
> > > > > > >> > KeeperErrorCode
> > > > > > >> > > =
> > > > > > >> > > > > NoNode for /brokers/topics/perf_gocontent
> > > > > > >> -uat_raw/partitions/1/state
> > > > > > >> > > > > (kafka.utils.ZkUtils$)
> > > > > > >> > > > > [2017-07-20 17:04:36,988] INFO Partition
> > > > > > >> [perf_gocontent-uat_raw,1]
> > > > > > >> > on
> > > > > > >> > > > > broker 1: Cached zkVersion [90] not equal to that in
> > > > > zookeeper,
> > > > > > >> skip
> > > > > > >> > > > > updating ISR (kafka.cluster.Partition)
> > > > > > >> > > > > [2017-07-20 17:04:36,988] INFO Partition
> > > > > > >> [perf_gocontent-uat_raw,1]
> > > > > > >> > on
> > > > > > >> > > > > broker 1: Cached zkVersion [90] not equal to that in
> > > > > zookeeper,
> > > > > > >> skip
> > > > > > >> > > > > updating ISR (kafka.cluster.Partition)
> > > > > > >> > > > >
> > > > > > >> > > > > I've tried a rolling restart of the cluster to see if
> > that
> > > > > fixed
> > > > > > >> it,
> > > > > > >> > > but
> > > > > > >> > > > it
> > > > > > >> > > > > did not.
> > > > > > >> > > > >
> > > > > > >> > > > > Can someone please help me out here?  I'm not sure
> how I
> > > can
> > > > > get
> > > > > > >> > things
> > > > > > >> > > > > back in sync.
> > > > > > >> > > > >
> > > > > > >> > > > > Thank you so much for your time.
> > > > > > >> > > > > Chris
> > > > > > >> > > > >
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to