Thanks again Chris. I am curious why I see the round-robin behavior I expected when using kafka-console-producer to inject messages though.
-J On Tuesday, June 7, 2016, Chris Horrocks <[email protected]> wrote: > It's by design of Kafka (and by extension flume). The producers are > designed to be many-to-one (producers to partitions) and as such picking a > random partition every 10 minutes prevents separate producer instances from > all randomly picking the same partition. > > -- > Chris Horrocks > > From: Jason Williams <[email protected]> > <javascript:_e(%7B%7D,'cvml','[email protected]');> > Reply: [email protected] > <javascript:_e(%7B%7D,'cvml','[email protected]');> > <[email protected]> > <javascript:_e(%7B%7D,'cvml','[email protected]');> > Date: 7 June 2016 at 09:43:34 > To: [email protected] > <javascript:_e(%7B%7D,'cvml','[email protected]');> > <[email protected]> > <javascript:_e(%7B%7D,'cvml','[email protected]');> > Subject: Re: Kafka Sink random partition assignment > > Hey Chris, > > Thanks for help! > > Is that a limitation of the Flume Kafka sink or Kafka itself? Because when > I use another Kafka producer and publish without a key, it randomly moves > among the partitions on every publish. > > -J > > Sent via iPhone > > On Jun 7, 2016, at 00:08, Chris Horrocks <[email protected] > <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: > > The producers bind to random partitions and move every 10 minutes. If you > leave it long enough you should see it in the producer flume agent logs, > although there's nothing to stop it from "randomly" choosing the same > partition twice. Annoyingly there's no concept of producer groups (yet) to > ensure that producers apportion the available partitions between them as > this would create a synchronisation issue between what should be entirely > independent processes. > > -- > Chris Horrocks > > On 7 June 2016 at 00:32:29, Jason J. W. Williams ( > [email protected] > <javascript:_e(%7B%7D,'cvml','[email protected]');>) wrote: > >> Hi, >> >> New to flume and I'm trying to relay log messages received over netcat >> source to Kafka sink. >> >> Everything seems to be fine, except that Flume is acting like it IS >> assigning a partition key to the produced messages though none is assigned. >> I'd like the messages to be assigned to a random partition, so that >> consumers are load balanced. >> >> * Flume 1.6.0 >> * Kafka 0.9.0.1 >> >> Flume config: >> https://gist.github.com/williamsjj/8ae025906955fbc4b5f990e162b75d7c >> >> Kafka topic config: kafka-topics --zookeeper localhost/kafka --create >> --topic activity.history --partitions 20 --replication-factor 1 >> >> Python consumer program: >> https://gist.github.com/williamsjj/9e67287f0154816c3a733a39ad008437 >> >> Test program (publishes to Flume): >> https://gist.github.com/williamsjj/1eb097a187a3edb17ec1a3913e47e58b >> >> Flume agent listens on 3132tcp for connections, and publishes messages >> received to the Kafka activity.history topic. I'm running two instances of >> the Python consumer. >> >> What happens however, is all logs messages get sent to a single Kafka >> consumer...if I restart Flume (leave consumers running) and re-run the >> test, all messages get published to the other consumer. So it feels like >> Flume is assigning a permanent partition key even though one is not defined >> (and should therefore be random). >> >> Any advice is greatly appreciated. >> >> -J >> >
