Sorry to tune in a bit late, but here goes.

> 1. The producer since 0.8 is actually zookeeper free, so this is not new to
> this client it is true for the current client as well. Our experience was
> that direct zookeeper connections from zillions of producers wasn't a good
> idea for a number of reasons. Our intention is to remove this dependency
> from the consumer as well. The configuration in the producer doesn't need
> the full set of brokers, though, just one or two machines to bootstrap the
> state of the cluster from--in other words it isn't like you need to
> reconfigure your clients every time you add some servers. This is exactly
> how zookeeper works too--if we used zookeeper you would need to give a list
> of zk urls in case a particular zk server was down. Basically either way
> you need a few statically configured nodes to go to discover the full state
> of the cluster. For people who don't like hard coding hosts you can use a
> VIP or dns or something instead.
In our configuration, the zookeeper quorum is actually one of the few
stable (in the sense of host names / ip addresses) pillars of the
complete ecosystem: every distributed service uses zookeeper to
coordinate the hosts that make up the service as a whole. Considering
that the kafka cluster will save the information needed for this
bootstrap to zookeeper anyhow, having clients (either producers or
consumers) retrieve this information at first use makes sense to me.

We could create routine that retrieves a list of brokers from zookeeper
before initializing a Producer, but that feels more like a workaround
for a feature that in my humble opinion could well be part of the kafka
client library. That said, I realise that having two options for
connection bootstrapping (assuming that hardcoding a list of brokers is
here to stay) could be confusing for new users, but bypassing zookeeper
for this was rather confusing for me when I first came across it :)

So, in short, I'd love it if the option to bootstrap the broker list
from zookeeper was there, rather than requiring to configure additional
(moving) virtual hostnames or fixed ip addresses for producers in our
cluster setup. I've been baffled a few times by this option not being
available for a distributed service that coordinates itself through
zookeeper.

Just my two cents :)

Mattijs

Reply via email to