Re: Performance issue with KafkaStreams

Guozhang Wang Mon, 19 Sep 2016 17:29:24 -0700

Hello Caleb,

`./gradlew jar` should be sufficient to run the SimpleBenchmark. Could you
look into kafka-run-class.sh and see if "streams/build/libs
/kafka-streams*.jar" is added to the dependent path? In trunk it is added.



Guozhang


On Sat, Sep 17, 2016 at 11:30 AM, Eno Thereska <eno.there...@gmail.com>
wrote:

> Hi Caleb,
>
> I usually do './gradlew installAll' first and that places all the jars in
> my local maven repo in ~/.m2/repository.
>
> Eno
>
> > On 17 Sep 2016, at 00:30, Caleb Welton <ca...@autonomic.ai> wrote:
> >
> > Is there a specific way that I need to build kafka for that to work?
> >
> > bash$  export INCLUDE_TEST_JARS=true; ./bin/kafka-run-class.sh
> > org.apache.kafka.streams.perf.SimpleBenchmark
> > Error: Could not find or load main class
> > org.apache.kafka.streams.perf.SimpleBenchmark
> >
> > bash$ find . -name SimpleBenchmark.java
> > ./streams/src/test/java/org/apache/kafka/streams/perf/
> SimpleBenchmark.java
> >
> > bash$  find . -name SimpleBenchmark.class
> >
> > bash$ jar -tf streams/build/libs/kafka-streams-0.10.1.0-SNAPSHOT.jar |
> grep
> > SimpleBenchmark
> >
> >
> >
> > On Sat, Sep 10, 2016 at 6:18 AM, Eno Thereska <eno.there...@gmail.com>
> > wrote:
> >
> >> Hi Caleb,
> >>
> >> We have a benchmark that we run nightly to keep track of performance.
> The
> >> numbers we have do indicate that consuming through streams is indeed
> slower
> >> than just a pure consumer, however the performance difference is not as
> >> large as you are observing. Would it be possible for you to run this
> >> benchmark in your environment and report the numbers? The benchmark is
> >> included with Kafka Streams and you run it like this:
> >>
> >> export INCLUDE_TEST_JARS=true; ./bin/kafka-run-class.sh
> >> org.apache.kafka.streams.perf.SimpleBenchmark
> >>
> >> You'll need a Kafka broker to be running. The benchmark will report
> >> various numbers, but one of them is the performance of the consumer and
> the
> >> performance of streams reading.
> >>
> >> Thanks
> >> Eno
> >>
> >>> On 9 Sep 2016, at 18:19, Caleb Welton <cewel...@yahoo.com.INVALID>
> >> wrote:
> >>>
> >>> Same in both cases:
> >>> client.id=Test-Prototype
> >>> application.id=test-prototype
> >>>
> >>> group.id=test-consumer-group
> >>> bootstrap.servers=broker1:9092,broker2:9092zookeeper.connect=zk1:2181
> >>> replication.factor=2
> >>>
> >>> auto.offset.reset=earliest
> >>>
> >>>
> >>> On Friday, September 9, 2016 8:48 AM, Eno Thereska <
> >> eno.there...@gmail.com> wrote:
> >>>
> >>>
> >>>
> >>> Hi Caleb,
> >>>
> >>> Could you share your Kafka Streams configuration (i.e., StreamsConfig
> >>> properties you might have set before the test)?
> >>>
> >>> Thanks
> >>> Eno
> >>>
> >>>
> >>> On Thu, Sep 8, 2016 at 12:46 AM, Caleb Welton <cwel...@apache.org>
> >> wrote:
> >>>
> >>>> I have a question with respect to the KafkaStreams API.
> >>>>
> >>>> I noticed during my prototyping work that my KafkaStreams application
> >> was
> >>>> not able to keep up with the input on the stream so I dug into it a
> bit
> >> and
> >>>> found that it was spending an inordinate amount of time in
> >>>> org.apache.kafka.common.network.Seloctor.select().  Not exactly a
> >> shooting
> >>>> gun itself, so I dropped the implementation down to a single processor
> >>>> reading off a source.
> >>>>
> >>>> public class TestProcessor extends AbstractProcessor<String, String> {
> >>>>   static long start = -1;
> >>>>   static long count = 0;
> >>>>
> >>>>   @Override
> >>>>   public void process(String key, String value) {
> >>>>       if (start < 0) {
> >>>>           start = System.currentTimeMillis();
> >>>>       }
> >>>>       count += 1;
> >>>>       if (count > 1000000) {
> >>>>           long end = System.currentTimeMillis();
> >>>>           double time = (end-start)/1000.0;
> >>>>           System.out.printf("Processed %d records in %f seconds (%f
> >>>> records/s)\n", count, time, count/time);
> >>>>           start = -1;
> >>>>           count = 0;
> >>>>       }
> >>>>
> >>>>   }
> >>>>
> >>>> }
> >>>>
> >>>> ...
> >>>>
> >>>>
> >>>> TopologyBuilder topologyBuilder = new TopologyBuilder();
> >>>> topologyBuilder
> >>>>       .addSource("SOURCE", stringDeserializer, StringDeserializer,
> >>>> "input")
> >>>>       .addProcessor("PROCESS", TestProcessor::new, "SOURCE");
> >>>>
> >>>>
> >>>>
> >>>> Which I then ran through the KafkaStreams API, and then repeated with
> >>>> the KafkaConsumer API.
> >>>>
> >>>> Using the KafkaConsumer API:
> >>>> Processed 1000001 records in 1.790000 seconds (558659.776536
> records/s)
> >>>> Processed 1000001 records in 1.229000 seconds (813670.463792
> records/s)
> >>>> Processed 1000001 records in 1.106000 seconds (904160.036166
> records/s)
> >>>> Processed 1000001 records in 1.190000 seconds (840336.974790
> records/s)
> >>>>
> >>>> Using the KafkaStreams API:
> >>>> Processed 1000001 records in 6.407000 seconds (156079.444358
> records/s)
> >>>> Processed 1000001 records in 5.256000 seconds (190258.942161
> records/s)
> >>>> Processed 1000001 records in 5.141000 seconds (194514.880373
> records/s)
> >>>> Processed 1000001 records in 5.111000 seconds (195656.622970
> records/s)
> >>>>
> >>>>
> >>>> The profile on the KafkaStreams consisted of:
> >>>>
> >>>> 89.2% org.apache.kafka.common.network.Selector.select()
> >>>> 7.6% org.apache.kafka.clients.producer.internals.
> >>>> ProduceRequestResult.await()
> >>>> 0.8% org.apach.kafka.common.network.PlaintextTransportLayer.read()
> >>>>
> >>>>
> >>>> Is a 5X performance difference between Kafka Consumer and the
> >>>> KafkaStreams api expected?
> >>>>
> >>>> Are there specific things I can do to diagnose/tune the system?
> >>>>
> >>>> Thanks,
> >>>> Caleb
> >>>>
> >>
> >>
>
>


-- 
-- Guozhang

Re: Performance issue with KafkaStreams

Reply via email to