Hi,

In short, I don't see Kafka having problems with those numbers.  Logstash
will have a harder time, I believe.
That said, it all depends on how you tune things an what kind of / how much
hardware you use.

2B or 200B events, yes, big numbers, but how quickly do you need to process
those? in 1 minute, 1 hour, 1 day, or a week? :)

SPM for Kafka (http://sematext.com/spm) will show you all possible Kafka
metrics you can imagine, so if you decide to give Kafka a try you'll be
able to tune Kafka with the help of SPM for Kafka charts and the help of
people on this mailing list.

Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/


On Thu, Feb 5, 2015 at 2:12 PM, Vineet Mishra <clearmido...@gmail.com>
wrote:

> Yury,
>
> Well thanks for sharing the insight of kafka partition distribution.
>
> Well I am more of a concerned about the throughtput that kafka-storm can
> collaborative give so as to event process.
>
> Currently I am having around a 30 Gb file with around .2 Billion events,
> this number is soon gonna rise 100 times the existing numbers.
>
> I was wondering will the above mentioned stream processing engine will be
> good fit in my case?
> If yes, then with what configuration and tuning so as to effectively use
> resource and maximize throughput.
>
> Thanks!
> On Feb 3, 2015 8:38 PM, "Yury Ruchin" <yuri.ruc...@gmail.com> wrote:
>
> > This is a quote from Kafka documentation:
> > "The routing decision is influenced by the kafka.producer.Partitioner.
> >
> > interface Partitioner<T> {
> >    int partition(T key, int numPartitions);
> > }
> > The partition API uses the key and the number of available broker
> > partitions to return a partition id. This id is used as an index into a
> > sorted list of broker_ids and partitions to pick a broker partition for
> the
> > producer request. The default partitioning strategy is
> > hash(key)%numPartitions. If the key is null, then a random broker
> partition
> > is picked. A custom partitioning strategy can also be plugged in using
> the
> > partitioner.class config parameter."
> >
> > An important point for the null key is that the randomly chosen broker
> > partition sticks for the time specified by "
> > topic.metadata.refresh.interval.ms" which is 10 minutes by default. So
> if
> > you are using null key for Logstash entries, you will be writing to the
> > same partition for 10 minutes. Is this your case?
> >
> > 2015-02-03 14:03 GMT+03:00 Vineet Mishra <clearmido...@gmail.com>:
> >
> > > Hi,
> > >
> > > I am having a setup where I am sniffing some logs(ofcourse the big
> ones)
> > > through Logstash Forwarder and forwarding it to Logstash, which in turn
> > > publish these events to Kafka.
> > >
> > > I have created the Kafka Topic ensuring the required number of
> Partitions
> > > and Replication Factor but not sure with Logstash Output
> Configuration, I
> > > am having following doubt with reference to the same.
> > >
> > > For the Logstash Publishing events to kafka
> > >
> > > 1) Do we need to explicitly define the Partition in Logstash while
> > > Publishing to Kafka
> > > 2) Will Kafka take care of the proper distribution of the data across
> the
> > > Partitions
> > >
> > > I am having a notion that despite of the fact of declaring the
> partitions
> > > in Kafka while creating Topic the data from Logstash is been pushed to
> > > single Partition or perhaps not getting uniformly distributed.
> > >
> > > Looking for the Expert Advise.
> > >
> > > Thanks!
> > >
> >
>

Reply via email to