Yury,

Well thanks for sharing the insight of kafka partition distribution.

Well I am more of a concerned about the throughtput that kafka-storm can
collaborative give so as to event process.

Currently I am having around a 30 Gb file with around .2 Billion events,
this number is soon gonna rise 100 times the existing numbers.

I was wondering will the above mentioned stream processing engine will be
good fit in my case?
If yes, then with what configuration and tuning so as to effectively use
resource and maximize throughput.

Thanks!
On Feb 3, 2015 8:38 PM, "Yury Ruchin" <yuri.ruc...@gmail.com> wrote:

> This is a quote from Kafka documentation:
> "The routing decision is influenced by the kafka.producer.Partitioner.
>
> interface Partitioner<T> {
>    int partition(T key, int numPartitions);
> }
> The partition API uses the key and the number of available broker
> partitions to return a partition id. This id is used as an index into a
> sorted list of broker_ids and partitions to pick a broker partition for the
> producer request. The default partitioning strategy is
> hash(key)%numPartitions. If the key is null, then a random broker partition
> is picked. A custom partitioning strategy can also be plugged in using the
> partitioner.class config parameter."
>
> An important point for the null key is that the randomly chosen broker
> partition sticks for the time specified by "
> topic.metadata.refresh.interval.ms" which is 10 minutes by default. So if
> you are using null key for Logstash entries, you will be writing to the
> same partition for 10 minutes. Is this your case?
>
> 2015-02-03 14:03 GMT+03:00 Vineet Mishra <clearmido...@gmail.com>:
>
> > Hi,
> >
> > I am having a setup where I am sniffing some logs(ofcourse the big ones)
> > through Logstash Forwarder and forwarding it to Logstash, which in turn
> > publish these events to Kafka.
> >
> > I have created the Kafka Topic ensuring the required number of Partitions
> > and Replication Factor but not sure with Logstash Output Configuration, I
> > am having following doubt with reference to the same.
> >
> > For the Logstash Publishing events to kafka
> >
> > 1) Do we need to explicitly define the Partition in Logstash while
> > Publishing to Kafka
> > 2) Will Kafka take care of the proper distribution of the data across the
> > Partitions
> >
> > I am having a notion that despite of the fact of declaring the partitions
> > in Kafka while creating Topic the data from Logstash is been pushed to
> > single Partition or perhaps not getting uniformly distributed.
> >
> > Looking for the Expert Advise.
> >
> > Thanks!
> >
>

Reply via email to