We're planning a deploy to AWS EC2, and I was hoping to get some advice on
best practices. I've seen the Loggly presentation [1], which has some good
recommendations on instance types and EBS setup. Aside from that, there
seem to be several options in terms of multi-Availability Zone (AZ)
deployment. The ones we're considering are:

1) Treat each AZ as a separate data center. Producers write to the kafka
cluster in the same AZ. For consumption, two options:
1a) designate one cluster the "master" cluster and use mirrormaker. This
was discussed here [2] where some gotchas related to offset management were
raised.
1b) Build consumers to consume from both clusters (e.g. Two camus jobs-one
for each cluster).

Pros:
* if there's a network partition between AZs (or extra latency), the
consumer(s) will catch up once the event is resolved.
* If an AZ goes offline, only unprocessed data in that AZ is lost until the
AZ comes back online. The other AZ is unaffected. (consume failover is more
complicated in 1a, it seems).
Cons:
* Duplicate infrastructure and either more moving parts (1a) or more
complicated consumers (1b).
* It's unclear how this scales if one wants to add a second region to the
mix.

2) The second option is to treat AZs as the same data center. In this case,
there's no guarantee that a writer is writing to a node in the same AZ.

Pros:
* Simplified setup-all data is in one place.
Cons:
* Harder to design for availability—what if the leader of the partition is
in a different AZ than the producer and there's a partition between AZs? If
latency is high or throughput is low between AZs, write throughput suffers
if `request.required.acks` = -1


Some other considerations:
* Zookeeper deploy—the best practice seems to be a 3-node cluster across 3
AZs, but option 1a/b would let us do separate clusters per AZ.
* EBS / provisioned IOPs—The Loggly presentation predates Kafka 0.8
replication. Are folks using ephemeral storage instead of EBS now?
Provisioned IOPs can get expensive pretty quickly.

Any suggestions/experience along these lines (or others!) would be greatly
appreciated. If there's good feedback, I'd be happy to put together a wiki
page with the details.

Thanks,
Joe

[1] http://search-hadoop.com/m/4TaT4BQRJy
[2] http://search-hadoop.com/m/4TaT49l0Gh/AWS+availability+zone/v=plain

Reply via email to