Hi,

using the streams library I noticed a difference (or there is a lack of 
knowledge on my side)with Apache Spark.

Imagine following scenario ...


I have a source topic where numeric values come in and I want to check the 
maximum value in the latest 5 seconds but ... putting the max value into a 
destination topic every 5 seconds.

This is what happens with reduceByWindow method in Spark.

I'm using reduce on a KStream here that process the max value taking into 
account previous values in the latest 5 seconds but the final value is put into 
the destination topic for each incoming value.


For example ...


An application sends numeric values every 1 second.

With Spark ... the source gets values every 1 second, process max in a window 
of 5 seconds, puts the max into the destination every 5 seconds (so when the 
window ends). If the sequence is 21, 25, 22, 20, 26 the output will be just 26.

With Kafka Streams ... the source gets values every 1 second, process max in a 
window of 5 seconds, puts the max into the destination every 1 seconds (so 
every time an incoming value arrives). Of course, if for example the sequence 
is 21, 25, 22, 20, 26 ... the output will be 21, 25, 25, 25, 26.


Is it possible with Kafka Streams ? Or it's something to do at application 
level ?


Thanks,

Paolo


Paolo Patierno
Senior Software Engineer (IoT) @ Red Hat
Microsoft MVP on Windows Embedded & IoT
Microsoft Azure Advisor

Twitter : @ppatierno<http://twitter.com/ppatierno>
Linkedin : paolopatierno<http://it.linkedin.com/in/paolopatierno>
Blog : DevExperience<http://paolopatierno.wordpress.com/>

Reply via email to