Did reply on SO.
-Matthias
On 1/24/24 2:18 AM, warrior2...@gmail.com wrote:
Let's say there's a topic in which chunks of different files are all
mixed up represented by a tuple |(FileId, Chunk)|.
Chunks of a same file also can be a little out of order.
The task is to aggregate all files and store them into some store.
The number of files is unbound.
In pseudo stream DSL that might look like
|topic('chunks') .groupByKey((fileId, chunk) -> fileId) .sortBy((fileId,
chunk) -> chunk.offset) .aggregate((fileId, chunk) ->
store.append(fileId, chunk)); |
I want to understand whether kafka streams can solve this efficiently.
Since the number of files is unbound how would kafka manage intermediate
topics for groupBy operation? How many partitions will it use etc? Can't
find this details in the docs. Also let's say chunk has a flag that
indicates EOF. How to indicate that specific group will no longer have
any new data?
That’s a copy of my stack overflow question.
apple-touch-i...@2.png
What does kafka streams groupBy does internally?
<https://stackoverflow.com/questions/77870807/what-does-kafka-streams-groupby-does-internally>
stackoverflow.com
<https://stackoverflow.com/questions/77870807/what-does-kafka-streams-groupby-does-internally>
<https://stackoverflow.com/questions/77870807/what-does-kafka-streams-groupby-does-internally>
—
Michael