Is there a specific way that I need to build kafka for that to work? bash$ export INCLUDE_TEST_JARS=true; ./bin/kafka-run-class.sh org.apache.kafka.streams.perf.SimpleBenchmark Error: Could not find or load main class org.apache.kafka.streams.perf.SimpleBenchmark
bash$ find . -name SimpleBenchmark.java ./streams/src/test/java/org/apache/kafka/streams/perf/SimpleBenchmark.java bash$ find . -name SimpleBenchmark.class bash$ jar -tf streams/build/libs/kafka-streams-0.10.1.0-SNAPSHOT.jar | grep SimpleBenchmark On Sat, Sep 10, 2016 at 6:18 AM, Eno Thereska <eno.there...@gmail.com> wrote: > Hi Caleb, > > We have a benchmark that we run nightly to keep track of performance. The > numbers we have do indicate that consuming through streams is indeed slower > than just a pure consumer, however the performance difference is not as > large as you are observing. Would it be possible for you to run this > benchmark in your environment and report the numbers? The benchmark is > included with Kafka Streams and you run it like this: > > export INCLUDE_TEST_JARS=true; ./bin/kafka-run-class.sh > org.apache.kafka.streams.perf.SimpleBenchmark > > You'll need a Kafka broker to be running. The benchmark will report > various numbers, but one of them is the performance of the consumer and the > performance of streams reading. > > Thanks > Eno > > > On 9 Sep 2016, at 18:19, Caleb Welton <cewel...@yahoo.com.INVALID> > wrote: > > > > Same in both cases: > > client.id=Test-Prototype > > application.id=test-prototype > > > > group.id=test-consumer-group > > bootstrap.servers=broker1:9092,broker2:9092zookeeper.connect=zk1:2181 > > replication.factor=2 > > > > auto.offset.reset=earliest > > > > > > On Friday, September 9, 2016 8:48 AM, Eno Thereska < > eno.there...@gmail.com> wrote: > > > > > > > > Hi Caleb, > > > > Could you share your Kafka Streams configuration (i.e., StreamsConfig > > properties you might have set before the test)? > > > > Thanks > > Eno > > > > > > On Thu, Sep 8, 2016 at 12:46 AM, Caleb Welton <cwel...@apache.org> > wrote: > > > >> I have a question with respect to the KafkaStreams API. > >> > >> I noticed during my prototyping work that my KafkaStreams application > was > >> not able to keep up with the input on the stream so I dug into it a bit > and > >> found that it was spending an inordinate amount of time in > >> org.apache.kafka.common.network.Seloctor.select(). Not exactly a > shooting > >> gun itself, so I dropped the implementation down to a single processor > >> reading off a source. > >> > >> public class TestProcessor extends AbstractProcessor<String, String> { > >> static long start = -1; > >> static long count = 0; > >> > >> @Override > >> public void process(String key, String value) { > >> if (start < 0) { > >> start = System.currentTimeMillis(); > >> } > >> count += 1; > >> if (count > 1000000) { > >> long end = System.currentTimeMillis(); > >> double time = (end-start)/1000.0; > >> System.out.printf("Processed %d records in %f seconds (%f > >> records/s)\n", count, time, count/time); > >> start = -1; > >> count = 0; > >> } > >> > >> } > >> > >> } > >> > >> ... > >> > >> > >> TopologyBuilder topologyBuilder = new TopologyBuilder(); > >> topologyBuilder > >> .addSource("SOURCE", stringDeserializer, StringDeserializer, > >> "input") > >> .addProcessor("PROCESS", TestProcessor::new, "SOURCE"); > >> > >> > >> > >> Which I then ran through the KafkaStreams API, and then repeated with > >> the KafkaConsumer API. > >> > >> Using the KafkaConsumer API: > >> Processed 1000001 records in 1.790000 seconds (558659.776536 records/s) > >> Processed 1000001 records in 1.229000 seconds (813670.463792 records/s) > >> Processed 1000001 records in 1.106000 seconds (904160.036166 records/s) > >> Processed 1000001 records in 1.190000 seconds (840336.974790 records/s) > >> > >> Using the KafkaStreams API: > >> Processed 1000001 records in 6.407000 seconds (156079.444358 records/s) > >> Processed 1000001 records in 5.256000 seconds (190258.942161 records/s) > >> Processed 1000001 records in 5.141000 seconds (194514.880373 records/s) > >> Processed 1000001 records in 5.111000 seconds (195656.622970 records/s) > >> > >> > >> The profile on the KafkaStreams consisted of: > >> > >> 89.2% org.apache.kafka.common.network.Selector.select() > >> 7.6% org.apache.kafka.clients.producer.internals. > >> ProduceRequestResult.await() > >> 0.8% org.apach.kafka.common.network.PlaintextTransportLayer.read() > >> > >> > >> Is a 5X performance difference between Kafka Consumer and the > >> KafkaStreams api expected? > >> > >> Are there specific things I can do to diagnose/tune the system? > >> > >> Thanks, > >> Caleb > >> > >