Charity,

I'm not sure about the specific problem you are having, but about Kafka on AWS, 
Netflix did a talk at a meetup about their Kafka installation on AWS. There 
might be some useful information in there. There is a video stream as well as 
slides, and maybe you can get in touch with the speakers. Look in the comment 
section for links to the slides and video. 

Kafka at Netflix
http://www.meetup.com//http-kafka-apache-org/events/220355031/?showDescription=true

There's also a talk about running Kafka on Mesos, which might be relevant.

Kafka on Mesos
http://www.meetup.com//http-kafka-apache-org/events/222537743/?showDescription=true

-James

Sent from my iPhone

> On Jul 2, 2016, at 5:15 PM, Charity Majors <char...@hound.sh> wrote:
> 
> Gwen, thanks for the response.
> 
> 1.1 Your life may be a bit simpler if you have a way of starting a new
> 
>> broker with the same ID as the old one - this means it will
>> automatically pick up the old replicas and you won't need to
>> rebalance. Makes life slightly easier in some cases.
> 
> Yeah, this is definitely doable, I just don't *want* to do it.  I really
> want all of these to share the same code path: 1) rolling all nodes in an
> ASG to pick up a new AMI, 2) hardware failure / unintentional node
> termination, 3) resizing the ASG and rebalancing the data across nodes.
> 
> Everything but the first one means generating new node IDs, so I would
> rather just do that across the board.  It's the solution that really fits
> the ASG model best, so I'm reluctant to give up on it.
> 
> 
>> 1.2 Careful not too rebalance too many partitions at once - you only
>> have so much bandwidth and currently Kafka will not throttle
>> rebalancing traffic.
> 
> Nod, got it.  This is def something I plan to work on hardening once I have
> the basic nut of things working (or if I've had to give up on it and accept
> a lesser solution).
> 
> 
>> 2. I think your rebalance script is not rebalancing the offsets topic?
>> It still has a replica on broker 1002. You have two good replicas, so
>> you are no where near disaster, but make sure you get this working
>> too.
> 
> Yes, this is another problem I am working on in parallel.  The Shopify
> sarama library <https://godoc.org/github.com/Shopify/sarama> uses the
> __consumer_offsets topic, but it does *not* let you rebalance or resize the
> topic when consumers connect, disconnect, or restart.
> 
> "Note that Sarama's Consumer implementation does not currently support
> automatic consumer-group rebalancing and offset tracking"
> 
> I'm working on trying to get the sarama-cluster to do something here.  I
> think these problems are likely related, I'm not sure wtf you are
> *supposed* to do to rebalance this god damn topic.  It also seems like we
> aren't using a consumer group which sarama-cluster depends on to rebalance
> a topic.  I'm still pretty confused by the 0.9 "consumer group" stuff.
> 
> Seriously considering downgrading to the latest 0.8 release, because
> there's a massive gap in documentation for the new stuff in 0.9 (like
> consumer groups) and we don't really need any of the new features.
> 
> A common work-around is to configure the consumer to handle "offset
>> out of range" exception by jumping to the last offset available in the
>> log. This is the behavior of the Java client, and it would have saved
>> your consumer here. Go client looks very low level, so I don't know
>> how easy it is to do that.
> 
> Erf, this seems like it would almost guarantee data loss.  :(  Will check
> it out tho.
> 
> If I were you, I'd retest your ASG scripts without the auto leader
>> election - since your own scripts can / should handle that.
> 
> Okay, this is straightforward enough.  Will try it.  And will keep tryingn
> to figure out how to balance the __consumer_offsets topic, since I
> increasingly think that's the key to this giant mess.
> 
> If anyone has any advice there, massively appreciated.
> 
> Thanks,
> 
> charity.

Reply via email to