Hmmm, I would really strongly urge us to not introduce a zk dependency just
for discovery. People who want to implement this can certainly do so by
simply looking up urls and setting them in the consumer config, but our
experience with doing this at large scale was pretty bad. Hardcoding the
discovery broker URLS shouldn't be worse than hardcoding the zk urls, and
folks who want to avoid that can use DNS or a vip. I think this will be
better in every way for 99% of people.

-Jay

On Tue, Jan 28, 2014 at 10:09 AM, Neha Narkhede <neha.narkh...@gmail.com>wrote:

> >> The producer since 0.8 is actually zookeeper free, so this is not new to
> this client it is true for the current client as well. Our experience was
> that direct zookeeper connections from zillions of producers wasn't a good
> idea for a number of reasons.
>
> The problem with several thousand connections to zookeeper is mainly the
> long lived sessions causing overhead on zookeeper.
> This further degrades zookeeper performance causing it to be flaky and
> expire sessions/disconnect clients and so on. That being said,
> I don't see why we can't use zookeeper *just* for the bootstrap on client
> startup and close the connection right after the bootstrap is done.
> IMO, this is more intuitive and convenient as it will allow users to the
> same "bootstrap config" across producers, consumers and brokers and
> will not cause any performance/operational issues on zookeeper. This is
> assuming that all the zillion clients don't bootstrap at the same time,
> which is rare in practice.
>
> Thanks,
> Neha
>
>
> On Tue, Jan 28, 2014 at 8:02 AM, Mattijs Ugen (DT) <matt...@holmes.nl
> >wrote:
>
> > Sorry to tune in a bit late, but here goes.
> >
> > > 1. The producer since 0.8 is actually zookeeper free, so this is not
> new
> > to
> > > this client it is true for the current client as well. Our experience
> was
> > > that direct zookeeper connections from zillions of producers wasn't a
> > good
> > > idea for a number of reasons. Our intention is to remove this
> dependency
> > > from the consumer as well. The configuration in the producer doesn't
> need
> > > the full set of brokers, though, just one or two machines to bootstrap
> > the
> > > state of the cluster from--in other words it isn't like you need to
> > > reconfigure your clients every time you add some servers. This is
> exactly
> > > how zookeeper works too--if we used zookeeper you would need to give a
> > list
> > > of zk urls in case a particular zk server was down. Basically either
> way
> > > you need a few statically configured nodes to go to discover the full
> > state
> > > of the cluster. For people who don't like hard coding hosts you can
> use a
> > > VIP or dns or something instead.
> > In our configuration, the zookeeper quorum is actually one of the few
> > stable (in the sense of host names / ip addresses) pillars of the
> > complete ecosystem: every distributed service uses zookeeper to
> > coordinate the hosts that make up the service as a whole. Considering
> > that the kafka cluster will save the information needed for this
> > bootstrap to zookeeper anyhow, having clients (either producers or
> > consumers) retrieve this information at first use makes sense to me.
> >
> > We could create routine that retrieves a list of brokers from zookeeper
> > before initializing a Producer, but that feels more like a workaround
> > for a feature that in my humble opinion could well be part of the kafka
> > client library. That said, I realise that having two options for
> > connection bootstrapping (assuming that hardcoding a list of brokers is
> > here to stay) could be confusing for new users, but bypassing zookeeper
> > for this was rather confusing for me when I first came across it :)
> >
> > So, in short, I'd love it if the option to bootstrap the broker list
> > from zookeeper was there, rather than requiring to configure additional
> > (moving) virtual hostnames or fixed ip addresses for producers in our
> > cluster setup. I've been baffled a few times by this option not being
> > available for a distributed service that coordinates itself through
> > zookeeper.
> >
> > Just my two cents :)
> >
> > Mattijs
> >
>

Reply via email to