This is a quote from Kafka documentation:
"The routing decision is influenced by the kafka.producer.Partitioner.

interface Partitioner<T> {
   int partition(T key, int numPartitions);
}
The partition API uses the key and the number of available broker
partitions to return a partition id. This id is used as an index into a
sorted list of broker_ids and partitions to pick a broker partition for the
producer request. The default partitioning strategy is
hash(key)%numPartitions. If the key is null, then a random broker partition
is picked. A custom partitioning strategy can also be plugged in using the
partitioner.class config parameter."

An important point for the null key is that the randomly chosen broker
partition sticks for the time specified by "
topic.metadata.refresh.interval.ms" which is 10 minutes by default. So if
you are using null key for Logstash entries, you will be writing to the
same partition for 10 minutes. Is this your case?

2015-02-03 14:03 GMT+03:00 Vineet Mishra <clearmido...@gmail.com>:

> Hi,
>
> I am having a setup where I am sniffing some logs(ofcourse the big ones)
> through Logstash Forwarder and forwarding it to Logstash, which in turn
> publish these events to Kafka.
>
> I have created the Kafka Topic ensuring the required number of Partitions
> and Replication Factor but not sure with Logstash Output Configuration, I
> am having following doubt with reference to the same.
>
> For the Logstash Publishing events to kafka
>
> 1) Do we need to explicitly define the Partition in Logstash while
> Publishing to Kafka
> 2) Will Kafka take care of the proper distribution of the data across the
> Partitions
>
> I am having a notion that despite of the fact of declaring the partitions
> in Kafka while creating Topic the data from Logstash is been pushed to
> single Partition or perhaps not getting uniformly distributed.
>
> Looking for the Expert Advise.
>
> Thanks!
>

Reply via email to