Hi Robert Here is the kafka benchmark for your reference. if you want to use Flink, Storm, Samza or Spark, the performance will be going down.
821,557 records/sec(78.3 MB/sec) https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines Best regards Hawin On Tue, Aug 4, 2015 at 11:57 AM, Robert Metzger <rmetz...@apache.org> wrote: > Sorry for the very late reply ... > > The performance issue was not caused by network latency. I had a job like > this: > FlinkKafkaConsumer --> someSimpleOperation --> FlinkKafkaProducer. > > I thought that our FlinkKafkaConsumer is slow, but actually our > FlinkKafkaProducer was using the old producer API of Kafka. Switching to > the new producer API of Kafka greatly improved our writing performance to > Kafka. Flink was slowing down the KafkaConsumer because of the producer. > > Since we are already talking about performance, let me ask you the > following question: > I am using Kafka and Flink on a HDP 2.2 cluster (with 40 machines). What > would you consider a good read/write performance for 8-byte messages on the > following setup? > - 40 brokers, > - topic with 120 partitions > - 120 reading threads (on 30 machines) > - 120 writing threads (on 30 machines) > > I'm getting a write throughput of ~75k elements/core/second and a read > throughput of ~50k el/c/s. > When I'm stopping the writers, the read throughput goes up to 130k. > I would expect a higher throughput than (8*75000) / 1024 = 585.9 kb/sec per > partition .. or are the messages too small and the overhead is very high. > > Which system out there would you recommend for getting reference > performance numbers? Samza, Spark, Storm? > > > On Wed, Jul 15, 2015 at 7:20 PM, Gwen Shapira <gshap...@cloudera.com> > wrote: > > > This is not something you can use the consumer API to simply do easily > > (consumers don't have locality notion). > > I can imagine using Kafka's low-level API calls to get a list of > > partitions and the lead replica, figuring out which are local and > > using those - but that sounds painful. > > > > Are you 100% sure the performance issue is due to network latency? If > > not, you may want to start optimizing somewhere more productive :) > > Kafka brokers and clients both have Metrics that may help you track > > where the performance issues are coming from. > > > > Gwen > > > > On Wed, Jul 15, 2015 at 9:24 AM, Robert Metzger <rmetz...@apache.org> > > wrote: > > > Hi Shef, > > > > > > did you resolve this issue? > > > I'm facing some performance issues and I was wondering whether reading > > > locally would resolve them. > > > > > > On Mon, Jun 22, 2015 at 11:43 PM, Shef <she...@yahoo.com> wrote: > > > > > >> Noob question here. I want to have a single consumer for each > partition > > >> that consumes only the messages that have been written locally. In > other > > >> words, I want the consumer to access the local disk and not pull > > anything > > >> across the network. Possible? > > >> > > >> How can I discover which partitions are local? > > >> > > >> > > >> > > >