Gotcha. Thanks again. Will post back once I've tried this with an update!
Chris On Fri, Jul 21, 2017 at 1:12 PM, Carl Haferd <chaf...@groupon.com.invalid> wrote: > I would recommend allowing each broker enough time to catch-up before > starting the next, but this may be less of a concern if the entire cluster > is being brought down and then started from scratch. To automate, we poll > until the Kafka process binds to its configured port (9092), and then once > all brokers are up we also poll until there are no longer any > under-replicated partitions before performing further maintenance. > > Carl > > On Fri, Jul 21, 2017 at 10:50 AM, Chris Neal <cwn...@gmail.com> wrote: > > > Thanks Carl. > > > > Always fun to do this stuff in production... ;) > > > > Appreciate the input. I'll try a full cycle and see how that works. > > > > In your opinion, if I stop all brokers and all Zookeeper nodes, then > > restart all Zookeepers...at that point can I start both brokers at the > same > > time, or should I let one broker fully start and read all the unflushed > > segments from disk before starting the second broker? > > > > Again, many thanks. > > Chris > > > > > > On Fri, Jul 21, 2017 at 12:13 PM, Carl Haferd > <chaf...@groupon.com.invalid > > > > > wrote: > > > > > I have encountered similar difficulties in a test environment and it > may > > be > > > necessary to stop the Kafka process on each broker and take Zookeeper > > > offline before removing the files and zookeeper paths. Otherwise there > > may > > > be a race condition between brokers which could cause the cluster to > > retain > > > information for the topic. > > > > > > Carl > > > > > > On Fri, Jul 21, 2017 at 9:06 AM, Chris Neal <cwn...@gmail.com> wrote: > > > > > > > Welp. Surprisingly, that did not fix the problem. :( > > > > > > > > I cleaned out all the entries for these topics from /config/topics, > and > > > > removed the logs from the file system for those topics, and the > > messages > > > > are still flying by in the server.log file. > > > > > > > > Also, more concerning, when I was looking through the log files for > the > > > > other broker in the cluster, I noticed the same type of message for a > > > topic > > > > that should actually be there: > > > > > > > > [2017-07-21 16:03:29,140] ERROR Conditional update of path > > > > /brokers/topics/perf_dstorage_raw/partitions/4/state with data > > > > {"controller_epoch":34,"leader":0,"version":1,"leader_ > > > epoch":0,"isr":[0]} > > > > and expected version 0 failed due to > > > > org.apache.zookeeper.KeeperException$BadVersionException: > > > KeeperErrorCode > > > > = > > > > BadVersion for /brokers/topics/perf_dstorage_raw/partitions/4/state > > > > (kafka.utils.ZkUtils$) > > > > [2017-07-21 16:03:29,142] ERROR Conditional update of path > > > > /brokers/topics/perf_dstorage_raw/partitions/0/state with data > > > > {"controller_epoch":34,"leader":0,"version":1,"leader_ > > > epoch":0,"isr":[0]} > > > > and expected version 0 failed due to > > > > org.apache.zookeeper.KeeperException$BadVersionException: > > > KeeperErrorCode > > > > = > > > > BadVersion for /brokers/topics/perf_dstorage_raw/partitions/0/state > > > > (kafka.utils.ZkUtils$) > > > > [2017-07-21 16:03:29,142] ERROR Conditional update of path > > > > /brokers/topics/perf_dstorage_raw/partitions/0/state with data > > > > {"controller_epoch":34,"leader":0,"version":1,"leader_ > > > epoch":0,"isr":[0]} > > > > and expected version 0 failed due to > > > > org.apache.zookeeper.KeeperException$BadVersionException: > > > KeeperErrorCode > > > > = > > > > BadVersion for /brokers/topics/perf_dstorage_raw/partitions/0/state > > > > (kafka.utils.ZkUtils$) > > > > > > > > So, the issue is not isolated to just these > "should-have-been-removed" > > > > topics, unfortunately. > > > > > > > > Really appreciate the input so far everyone. Still looking though > for > > a > > > > solution. Many thanks. :) > > > > > > > > Chris > > > > > > > > On Fri, Jul 21, 2017 at 10:58 AM, M. Manna <manme...@gmail.com> > wrote: > > > > > > > > > Just to add (in case the platoform is Windows) > > > > > > > > > > For Windows based cluster implementation, log/topic cleanup doesn't > > > work > > > > > out of the box. Users are more or less aware of it, and doing their > > own > > > > > maintenance as workaround. > > > > > If you have issues on Topic deletion not working properly on > Windows > > > > (i.e. > > > > > with topic deletion enable and all other settings). then you have > to > > > > > manually delete the files. > > > > > > > > > > > > > > > On 21 July 2017 at 16:53, Chris Neal <cwn...@gmail.com> wrote: > > > > > > > > > > > @Carl, > > > > > > > > > > > > There is nothing under /admin/delete_topics other than > > > > > > > > > > > > [] > > > > > > > > > > > > And nothing under /admin other than delete_topics :) > > > > > > > > > > > > The topics DO exist, however, under /config/topics! We may be on > > to > > > > > > something. I will remove them here and see if that clears it up. > > > > > > > > > > > > Thanks so much for all the help! > > > > > > Chris > > > > > > > > > > > > On Thu, Jul 20, 2017 at 10:37 PM, Chris Neal <cwn...@gmail.com> > > > wrote: > > > > > > > > > > > > > Thanks again for the replies. VERY much appreciated. I'll > check > > > > both > > > > > > > /admin/delete_topics and /config/topics. > > > > > > > > > > > > > > Chris > > > > > > > > > > > > > > On Thu, Jul 20, 2017 at 9:22 PM, Carl Haferd > > > > > <chaf...@groupon.com.invalid > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > >> If delete normally works, there would hopefully be some log > > > entries > > > > > when > > > > > > >> it > > > > > > >> fails. Are there any unusual zookeeper entries in the > > > > > > >> /admin/delete_topics > > > > > > >> path or in the other /admin folders? > > > > > > >> > > > > > > >> Does the topic name still exist in zookeeper under > > /config/topics? > > > > If > > > > > > so, > > > > > > >> that should probably deleted as well. > > > > > > >> > > > > > > >> Carl > > > > > > >> > > > > > > >> On Thu, Jul 20, 2017 at 6:42 PM, Chris Neal <cwn...@gmail.com > > > > > > wrote: > > > > > > >> > > > > > > >> > Delete is definitely there. The delete worked fine, based > on > > > the > > > > > fact > > > > > > >> that > > > > > > >> > there is nothing in Zookeeper, and that the controller > > reported > > > > that > > > > > > the > > > > > > >> > delete was successful, it's just something seems to have > > gotten > > > > out > > > > > of > > > > > > >> > sync. > > > > > > >> > > > > > > > >> > delete.topic.enabled is true. I've successfully deleted > > topics > > > in > > > > > the > > > > > > >> > past, so I know it *should* work. :) > > > > > > >> > > > > > > > >> > I also had already checked in Zookeeper, and there is no > > > directory > > > > > for > > > > > > >> the > > > > > > >> > topics under /brokers/topics.... Very strange indeed. > > > > > > >> > > > > > > > >> > If I just remove the log directories from the filesystem, is > > > that > > > > > > >> enough to > > > > > > >> > get the broker to stop asking about the topics? I would > guess > > > > there > > > > > > >> would > > > > > > >> > need to be more than just that, but I could be wrong. > > > > > > >> > > > > > > > >> > Thanks guys for the suggestions though! > > > > > > >> > > > > > > > >> > On Thu, Jul 20, 2017 at 8:19 PM, Stephen Powis < > > > > > spo...@salesforce.com > > > > > > > > > > > > > >> > wrote: > > > > > > >> > > > > > > > >> > > I could be totally wrong, but I seem to recall that delete > > > > wasn't > > > > > > >> fully > > > > > > >> > > implemented in 0.8.x? > > > > > > >> > > > > > > > > >> > > On Fri, Jul 21, 2017 at 10:10 AM, Carl Haferd > > > > > > >> > <chaf...@groupon.com.invalid > > > > > > >> > > > > > > > > > >> > > wrote: > > > > > > >> > > > > > > > > >> > > > Chris, > > > > > > >> > > > > > > > > > >> > > > You could first check to make sure that > > delete.topic.enable > > > is > > > > > > true > > > > > > >> and > > > > > > >> > > try > > > > > > >> > > > deleting again if not. If that doesn't work with > 0.8.1.1 > > > you > > > > > > might > > > > > > >> > need > > > > > > >> > > to > > > > > > >> > > > manually remove the topic's log files from the > configured > > > > > log.dirs > > > > > > >> > folder > > > > > > >> > > > on each broker in addition to removing the topic's > > zookeeper > > > > > path. > > > > > > >> > > > > > > > > > >> > > > Carl > > > > > > >> > > > > > > > > > >> > > > On Thu, Jul 20, 2017 at 10:06 AM, Chris Neal < > > > > cwn...@gmail.com> > > > > > > >> wrote: > > > > > > >> > > > > > > > > > >> > > > > Hi all, > > > > > > >> > > > > > > > > > > >> > > > > I have a weird situation here. I have deleted a few > > > topics > > > > on > > > > > > my > > > > > > >> > > 0.8.1.1 > > > > > > >> > > > > cluster (old, I know...). The deletes succeeded > > according > > > > to > > > > > > the > > > > > > >> > > > > controller.log: > > > > > > >> > > > > > > > > > > >> > > > > [2017-07-20 16:40:31,175] INFO [TopicChangeListener on > > > > > > Controller > > > > > > >> 1]: > > > > > > >> > > New > > > > > > >> > > > > topics: [Set()], deleted topics: > > > > > > >> > > > > [Set(perf_doorway-supplier-adapter-uat_raw)], new > > > partition > > > > > > >> replica > > > > > > >> > > > > assignment [Map()] > > > > > > >> > > > > (kafka.controller.PartitionStateMachine$ > > > > TopicChangeListener) > > > > > > >> > > > > [2017-07-20 16:40:33,507] INFO [TopicChangeListener on > > > > > > Controller > > > > > > >> 1]: > > > > > > >> > > New > > > > > > >> > > > > topics: [Set()], deleted topics: > > > > > > >> > > > > [Set(perf_doorway-supplier-scheduler-uat_raw)], new > > > > partition > > > > > > >> > replica > > > > > > >> > > > > assignment [Map()] > > > > > > >> > > > > (kafka.controller.PartitionStateMachine$ > > > > TopicChangeListener) > > > > > > >> > > > > [2017-07-20 16:40:36,504] INFO [TopicChangeListener on > > > > > > Controller > > > > > > >> 1]: > > > > > > >> > > New > > > > > > >> > > > > topics: [Set()], deleted topics: > > > > > [Set(perf_gocontent-uat_raw)], > > > > > > >> new > > > > > > >> > > > > partition replica assignment [Map()] > > > > > > >> > > > > (kafka.controller.PartitionStateMachine$ > > > > TopicChangeListener) > > > > > > >> > > > > [2017-07-20 16:40:38,290] INFO [TopicChangeListener on > > > > > > Controller > > > > > > >> 1]: > > > > > > >> > > New > > > > > > >> > > > > topics: [Set()], deleted topics: > > > > > [Set(perf_goplatform-uat_raw)] > > > > > > , > > > > > > >> new > > > > > > >> > > > > partition replica assignment [Map()] > > > > > > >> > > > > (kafka.controller.PartitionStateMachine$ > > > > TopicChangeListener) > > > > > > >> > > > > > > > > > > >> > > > > I query Zookeeper and the path is not there under > > > > > > /brokers/topics > > > > > > >> as > > > > > > >> > > > well. > > > > > > >> > > > > > > > > > > >> > > > > But, one of the nodes in my cluster continues to try > and > > > use > > > > > > them: > > > > > > >> > > > > > > > > > > >> > > > > [2017-07-20 17:04:36,723] ERROR Conditional update of > > path > > > > > > >> > > > > /brokers/topics/perf_doorway- > > supplier-scheduler-uat_raw/ > > > > > > >> > > > partitions/3/state > > > > > > >> > > > > with data > > > > > > >> > > > > {"controller_epoch":34," > leader":1,"version":1,"leader_ > > > > > > >> > > > > epoch":2,"isr":[1,0]} > > > > > > >> > > > > and expected version 69 failed due to > > > > > > >> > > > > org.apache.zookeeper.KeeperException$NoNodeException: > > > > > > >> > KeeperErrorCode > > > > > > >> > > = > > > > > > >> > > > > NoNode for > > > > > > >> > > > > /brokers/topics/perf_doorway- > > supplier-scheduler-uat_raw/ > > > > > > >> > > > partitions/3/state > > > > > > >> > > > > (kafka.utils.ZkUtils$) > > > > > > >> > > > > [2017-07-20 17:04:36,723] INFO Partition > > > > > > >> > > > > [perf_doorway-supplier-scheduler-uat_raw,3] on broker > > 1: > > > > > Cached > > > > > > >> > > > zkVersion > > > > > > >> > > > > [69] not equal to that in zookeeper, skip updating ISR > > > > > > >> > > > > (kafka.cluster.Partition) > > > > > > >> > > > > [2017-07-20 17:04:36,723] INFO Partition > > > > > > >> > > > > [perf_doorway-supplier-scheduler-uat_raw,3] on broker > > 1: > > > > > Cached > > > > > > >> > > > zkVersion > > > > > > >> > > > > [69] not equal to that in zookeeper, skip updating ISR > > > > > > >> > > > > (kafka.cluster.Partition) > > > > > > >> > > > > [2017-07-20 17:04:36,764] INFO Partition > > > > > > >> [perf_goplatform-uat_raw,2] > > > > > > >> > on > > > > > > >> > > > > broker 1: Shrinking ISR for partition > > > > > > [perf_goplatform-uat_raw,2] > > > > > > >> > from > > > > > > >> > > > 1,0 > > > > > > >> > > > > to 1 (kafka.cluster.Partition) > > > > > > >> > > > > [2017-07-20 17:04:36,764] INFO Partition > > > > > > >> [perf_goplatform-uat_raw,2] > > > > > > >> > on > > > > > > >> > > > > broker 1: Shrinking ISR for partition > > > > > > [perf_goplatform-uat_raw,2] > > > > > > >> > from > > > > > > >> > > > 1,0 > > > > > > >> > > > > to 1 (kafka.cluster.Partition) > > > > > > >> > > > > [2017-07-20 17:04:36,765] ERROR Conditional update of > > path > > > > > > >> > > > > /brokers/topics/perf_goplatform-uat_raw/partitions/ > > > 2/state > > > > > with > > > > > > >> data > > > > > > >> > > > > {"controller_epoch":34," > leader":1,"version":1,"leader_ > > > > > > >> > > > epoch":2,"isr":[1]} > > > > > > >> > > > > and expected version 70 failed due to > > > > > > >> > > > > org.apache.zookeeper.KeeperException$NoNodeException: > > > > > > >> > KeeperErrorCode > > > > > > >> > > = > > > > > > >> > > > > NoNode for /brokers/topics/perf_ > > > > > goplatform-uat_raw/partitions/ > > > > > > >> > 2/state > > > > > > >> > > > > (kafka.utils.ZkUtils$) > > > > > > >> > > > > [2017-07-20 17:04:36,765] ERROR Conditional update of > > path > > > > > > >> > > > > /brokers/topics/perf_goplatform-uat_raw/partitions/ > > > 2/state > > > > > with > > > > > > >> data > > > > > > >> > > > > {"controller_epoch":34," > leader":1,"version":1,"leader_ > > > > > > >> > > > epoch":2,"isr":[1]} > > > > > > >> > > > > and expected version 70 failed due to > > > > > > >> > > > > org.apache.zookeeper.KeeperException$NoNodeException: > > > > > > >> > KeeperErrorCode > > > > > > >> > > = > > > > > > >> > > > > NoNode for /brokers/topics/perf_ > > > > > goplatform-uat_raw/partitions/ > > > > > > >> > 2/state > > > > > > >> > > > > (kafka.utils.ZkUtils$) > > > > > > >> > > > > [2017-07-20 17:04:36,765] INFO Partition > > > > > > >> [perf_goplatform-uat_raw,2] > > > > > > >> > on > > > > > > >> > > > > broker 1: Cached zkVersion [70] not equal to that in > > > > > zookeeper, > > > > > > >> skip > > > > > > >> > > > > updating ISR (kafka.cluster.Partition) > > > > > > >> > > > > [2017-07-20 17:04:36,765] INFO Partition > > > > > > >> [perf_goplatform-uat_raw,2] > > > > > > >> > on > > > > > > >> > > > > broker 1: Cached zkVersion [70] not equal to that in > > > > > zookeeper, > > > > > > >> skip > > > > > > >> > > > > updating ISR (kafka.cluster.Partition) > > > > > > >> > > > > [2017-07-20 17:04:36,981] INFO Partition > > > > > > >> [perf_gocontent-uat_raw,1] > > > > > > >> > on > > > > > > >> > > > > broker 1: Shrinking ISR for partition > > > > > [perf_gocontent-uat_raw,1] > > > > > > >> from > > > > > > >> > > 1,0 > > > > > > >> > > > > to 1 (kafka.cluster.Partition) > > > > > > >> > > > > [2017-07-20 17:04:36,981] INFO Partition > > > > > > >> [perf_gocontent-uat_raw,1] > > > > > > >> > on > > > > > > >> > > > > broker 1: Shrinking ISR for partition > > > > > [perf_gocontent-uat_raw,1] > > > > > > >> from > > > > > > >> > > 1,0 > > > > > > >> > > > > to 1 (kafka.cluster.Partition) > > > > > > >> > > > > [2017-07-20 17:04:36,988] ERROR Conditional update of > > path > > > > > > >> > > > > /brokers/topics/perf_gocontent-uat_raw/partitions/ > > 1/state > > > > > with > > > > > > >> data > > > > > > >> > > > > {"controller_epoch":34," > leader":1,"version":1,"leader_ > > > > > > >> > > > epoch":4,"isr":[1]} > > > > > > >> > > > > and expected version 90 failed due to > > > > > > >> > > > > org.apache.zookeeper.KeeperException$NoNodeException: > > > > > > >> > KeeperErrorCode > > > > > > >> > > = > > > > > > >> > > > > NoNode for /brokers/topics/perf_gocontent > > > > > > >> -uat_raw/partitions/1/state > > > > > > >> > > > > (kafka.utils.ZkUtils$) > > > > > > >> > > > > [2017-07-20 17:04:36,988] ERROR Conditional update of > > path > > > > > > >> > > > > /brokers/topics/perf_gocontent-uat_raw/partitions/ > > 1/state > > > > > with > > > > > > >> data > > > > > > >> > > > > {"controller_epoch":34," > leader":1,"version":1,"leader_ > > > > > > >> > > > epoch":4,"isr":[1]} > > > > > > >> > > > > and expected version 90 failed due to > > > > > > >> > > > > org.apache.zookeeper.KeeperException$NoNodeException: > > > > > > >> > KeeperErrorCode > > > > > > >> > > = > > > > > > >> > > > > NoNode for /brokers/topics/perf_gocontent > > > > > > >> -uat_raw/partitions/1/state > > > > > > >> > > > > (kafka.utils.ZkUtils$) > > > > > > >> > > > > [2017-07-20 17:04:36,988] INFO Partition > > > > > > >> [perf_gocontent-uat_raw,1] > > > > > > >> > on > > > > > > >> > > > > broker 1: Cached zkVersion [90] not equal to that in > > > > > zookeeper, > > > > > > >> skip > > > > > > >> > > > > updating ISR (kafka.cluster.Partition) > > > > > > >> > > > > [2017-07-20 17:04:36,988] INFO Partition > > > > > > >> [perf_gocontent-uat_raw,1] > > > > > > >> > on > > > > > > >> > > > > broker 1: Cached zkVersion [90] not equal to that in > > > > > zookeeper, > > > > > > >> skip > > > > > > >> > > > > updating ISR (kafka.cluster.Partition) > > > > > > >> > > > > > > > > > > >> > > > > I've tried a rolling restart of the cluster to see if > > that > > > > > fixed > > > > > > >> it, > > > > > > >> > > but > > > > > > >> > > > it > > > > > > >> > > > > did not. > > > > > > >> > > > > > > > > > > >> > > > > Can someone please help me out here? I'm not sure > how I > > > can > > > > > get > > > > > > >> > things > > > > > > >> > > > > back in sync. > > > > > > >> > > > > > > > > > > >> > > > > Thank you so much for your time. > > > > > > >> > > > > Chris > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >