Great talks, but not relevant to either of my problems -- the golang client not rebalancing the consumer offset topic, or autoscaling group behavior (which is I think is probably just a consequence of the first).
Thanks though, there's good stuff in here. On Sun, Jul 3, 2016 at 10:23 AM, James Cheng <wushuja...@gmail.com> wrote: > Charity, > > I'm not sure about the specific problem you are having, but about Kafka on > AWS, Netflix did a talk at a meetup about their Kafka installation on AWS. > There might be some useful information in there. There is a video stream as > well as slides, and maybe you can get in touch with the speakers. Look in > the comment section for links to the slides and video. > > Kafka at Netflix > > http://www.meetup.com//http-kafka-apache-org/events/220355031/?showDescription=true > > There's also a talk about running Kafka on Mesos, which might be relevant. > > Kafka on Mesos > > http://www.meetup.com//http-kafka-apache-org/events/222537743/?showDescription=true > > -James > > Sent from my iPhone > > > On Jul 2, 2016, at 5:15 PM, Charity Majors <char...@hound.sh> wrote: > > > > Gwen, thanks for the response. > > > > 1.1 Your life may be a bit simpler if you have a way of starting a new > > > >> broker with the same ID as the old one - this means it will > >> automatically pick up the old replicas and you won't need to > >> rebalance. Makes life slightly easier in some cases. > > > > Yeah, this is definitely doable, I just don't *want* to do it. I really > > want all of these to share the same code path: 1) rolling all nodes in an > > ASG to pick up a new AMI, 2) hardware failure / unintentional node > > termination, 3) resizing the ASG and rebalancing the data across nodes. > > > > Everything but the first one means generating new node IDs, so I would > > rather just do that across the board. It's the solution that really fits > > the ASG model best, so I'm reluctant to give up on it. > > > > > >> 1.2 Careful not too rebalance too many partitions at once - you only > >> have so much bandwidth and currently Kafka will not throttle > >> rebalancing traffic. > > > > Nod, got it. This is def something I plan to work on hardening once I > have > > the basic nut of things working (or if I've had to give up on it and > accept > > a lesser solution). > > > > > >> 2. I think your rebalance script is not rebalancing the offsets topic? > >> It still has a replica on broker 1002. You have two good replicas, so > >> you are no where near disaster, but make sure you get this working > >> too. > > > > Yes, this is another problem I am working on in parallel. The Shopify > > sarama library <https://godoc.org/github.com/Shopify/sarama> uses the > > __consumer_offsets topic, but it does *not* let you rebalance or resize > the > > topic when consumers connect, disconnect, or restart. > > > > "Note that Sarama's Consumer implementation does not currently support > > automatic consumer-group rebalancing and offset tracking" > > > > I'm working on trying to get the sarama-cluster to do something here. I > > think these problems are likely related, I'm not sure wtf you are > > *supposed* to do to rebalance this god damn topic. It also seems like we > > aren't using a consumer group which sarama-cluster depends on to > rebalance > > a topic. I'm still pretty confused by the 0.9 "consumer group" stuff. > > > > Seriously considering downgrading to the latest 0.8 release, because > > there's a massive gap in documentation for the new stuff in 0.9 (like > > consumer groups) and we don't really need any of the new features. > > > > A common work-around is to configure the consumer to handle "offset > >> out of range" exception by jumping to the last offset available in the > >> log. This is the behavior of the Java client, and it would have saved > >> your consumer here. Go client looks very low level, so I don't know > >> how easy it is to do that. > > > > Erf, this seems like it would almost guarantee data loss. :( Will check > > it out tho. > > > > If I were you, I'd retest your ASG scripts without the auto leader > >> election - since your own scripts can / should handle that. > > > > Okay, this is straightforward enough. Will try it. And will keep > tryingn > > to figure out how to balance the __consumer_offsets topic, since I > > increasingly think that's the key to this giant mess. > > > > If anyone has any advice there, massively appreciated. > > > > Thanks, > > > > charity. >