Hey Chris, Thanks for help!
Is that a limitation of the Flume Kafka sink or Kafka itself? Because when I use another Kafka producer and publish without a key, it randomly moves among the partitions on every publish. -J Sent via iPhone > On Jun 7, 2016, at 00:08, Chris Horrocks <[email protected]> wrote: > > The producers bind to random partitions and move every 10 minutes. If you > leave it long enough you should see it in the producer flume agent logs, > although there's nothing to stop it from "randomly" choosing the same > partition twice. Annoyingly there's no concept of producer groups (yet) to > ensure that producers apportion the available partitions between them as this > would create a synchronisation issue between what should be entirely > independent processes. > > -- > Chris Horrocks >> On 7 June 2016 at 00:32:29, Jason J. W. Williams ([email protected]) >> wrote: >> >> Hi, >> >> New to flume and I'm trying to relay log messages received over netcat >> source to Kafka sink. >> >> Everything seems to be fine, except that Flume is acting like it IS >> assigning a partition key to the produced messages though none is assigned. >> I'd like the messages to be assigned to a random partition, so that >> consumers are load balanced. >> >> * Flume 1.6.0 >> * Kafka 0.9.0.1 >> >> Flume config: >> https://gist.github.com/williamsjj/8ae025906955fbc4b5f990e162b75d7c >> >> Kafka topic config: kafka-topics --zookeeper localhost/kafka --create >> --topic activity.history --partitions 20 --replication-factor 1 >> >> Python consumer program: >> https://gist.github.com/williamsjj/9e67287f0154816c3a733a39ad008437 >> >> Test program (publishes to Flume): >> https://gist.github.com/williamsjj/1eb097a187a3edb17ec1a3913e47e58b >> >> Flume agent listens on 3132tcp for connections, and publishes messages >> received to the Kafka activity.history topic. I'm running two instances of >> the Python consumer. >> >> What happens however, is all logs messages get sent to a single Kafka >> consumer...if I restart Flume (leave consumers running) and re-run the test, >> all messages get published to the other consumer. So it feels like Flume is >> assigning a permanent partition key even though one is not defined (and >> should therefore be random). >> >> Any advice is greatly appreciated. >> >> -J
