Hi, I was reading about Kafka streams and trying to understand its programming model. Some observations that I wanted to get some clarity on..
1) Joins & aggregations use an internal topic for shuffle. Source processors will write to this topic with the key used for join. Then it is free to commit offset for that run. Does that mean we have to rely on internal topic's replication to guarantee at-least-once processing? Won't these replicated internal topics put too much load on brokers? We have to scale up brokers whenever our stream processing needs increase. Its easy to add/remove processing nodes to run kstream app instance but adding/removing brokers is not that straight forward. 2) How may partitions are created for the internal topic? What is the retention policy? When/how is it created and cleaned up when streaming application is shutdown permanently? Any link with deep dive into this will be helpful. 3) I'm a bit confused on the threading model. Lets take the below example. Assume "PageViews" and "UserProfile" have four partitions each. I start two instance of this app both with two threads. So we have four threads all together. My understanding is that each thread will now receive records from one partition in both topics. After reading they'll do the map, filter, etc and write to internal topic for join. Then the threads go on to read the next set of records from previous offset. I guess this can be viewed as some sort of chaining with in a stage. Now where does the thread that reads from internal topic run? Do they share the same set of threads? Can we increase the parallelism for just this processor? If I know that the join().map().process() chain is more compute intensive. KStream<String, GenericRecord> views = builder.stream("PageViews") .map(...) .filter(...) KTable<String, GenericRecord> users = builder.table("UserProfile") .mapValues(...) KStream<String, Long> regionCount = views .leftJoin(user, ...) .map(...) .process(...) I hope I was able to explain my questions clear enough for you to understand. Thanks, Srikanth