We're planning a deploy to AWS EC2, and I was hoping to get some advice on best practices. I've seen the Loggly presentation [1], which has some good recommendations on instance types and EBS setup. Aside from that, there seem to be several options in terms of multi-Availability Zone (AZ) deployment. The ones we're considering are:
1) Treat each AZ as a separate data center. Producers write to the kafka cluster in the same AZ. For consumption, two options: 1a) designate one cluster the "master" cluster and use mirrormaker. This was discussed here [2] where some gotchas related to offset management were raised. 1b) Build consumers to consume from both clusters (e.g. Two camus jobs-one for each cluster). Pros: * if there's a network partition between AZs (or extra latency), the consumer(s) will catch up once the event is resolved. * If an AZ goes offline, only unprocessed data in that AZ is lost until the AZ comes back online. The other AZ is unaffected. (consume failover is more complicated in 1a, it seems). Cons: * Duplicate infrastructure and either more moving parts (1a) or more complicated consumers (1b). * It's unclear how this scales if one wants to add a second region to the mix. 2) The second option is to treat AZs as the same data center. In this case, there's no guarantee that a writer is writing to a node in the same AZ. Pros: * Simplified setup-all data is in one place. Cons: * Harder to design for availability—what if the leader of the partition is in a different AZ than the producer and there's a partition between AZs? If latency is high or throughput is low between AZs, write throughput suffers if `request.required.acks` = -1 Some other considerations: * Zookeeper deploy—the best practice seems to be a 3-node cluster across 3 AZs, but option 1a/b would let us do separate clusters per AZ. * EBS / provisioned IOPs—The Loggly presentation predates Kafka 0.8 replication. Are folks using ephemeral storage instead of EBS now? Provisioned IOPs can get expensive pretty quickly. Any suggestions/experience along these lines (or others!) would be greatly appreciated. If there's good feedback, I'd be happy to put together a wiki page with the details. Thanks, Joe [1] http://search-hadoop.com/m/4TaT4BQRJy [2] http://search-hadoop.com/m/4TaT49l0Gh/AWS+availability+zone/v=plain