By default a Kafka producer will choose a random partition, however, I believe the Kafka sink by default partitions of the message key, so if the key is null it won't do a good job.
On 7 June 2016 at 09:43, Jason Williams <[email protected]> wrote: > Hey Chris, > > Thanks for help! > > Is that a limitation of the Flume Kafka sink or Kafka itself? Because when > I use another Kafka producer and publish without a key, it randomly moves > among the partitions on every publish. > > -J > > Sent via iPhone > > On Jun 7, 2016, at 00:08, Chris Horrocks <[email protected]> wrote: > > The producers bind to random partitions and move every 10 minutes. If you > leave it long enough you should see it in the producer flume agent logs, > although there's nothing to stop it from "randomly" choosing the same > partition twice. Annoyingly there's no concept of producer groups (yet) to > ensure that producers apportion the available partitions between them as > this would create a synchronisation issue between what should be entirely > independent processes. > > -- > Chris Horrocks > > On 7 June 2016 at 00:32:29, Jason J. W. Williams ( > [email protected]) wrote: > >> Hi, >> >> New to flume and I'm trying to relay log messages received over netcat >> source to Kafka sink. >> >> Everything seems to be fine, except that Flume is acting like it IS >> assigning a partition key to the produced messages though none is assigned. >> I'd like the messages to be assigned to a random partition, so that >> consumers are load balanced. >> >> * Flume 1.6.0 >> * Kafka 0.9.0.1 >> >> Flume config: >> https://gist.github.com/williamsjj/8ae025906955fbc4b5f990e162b75d7c >> >> Kafka topic config: kafka-topics --zookeeper localhost/kafka --create >> --topic activity.history --partitions 20 --replication-factor 1 >> >> Python consumer program: >> https://gist.github.com/williamsjj/9e67287f0154816c3a733a39ad008437 >> >> Test program (publishes to Flume): >> https://gist.github.com/williamsjj/1eb097a187a3edb17ec1a3913e47e58b >> >> Flume agent listens on 3132tcp for connections, and publishes messages >> received to the Kafka activity.history topic. I'm running two instances of >> the Python consumer. >> >> What happens however, is all logs messages get sent to a single Kafka >> consumer...if I restart Flume (leave consumers running) and re-run the >> test, all messages get published to the other consumer. So it feels like >> Flume is assigning a permanent partition key even though one is not defined >> (and should therefore be random). >> >> Any advice is greatly appreciated. >> >> -J >> >
