Re: What does kafka streams groupBy does internally?

Matthias J. Sax Tue, 30 Jan 2024 11:05:59 -0800

Did reply on SO.

-Matthias


On 1/24/24 2:18 AM, [email protected] wrote:

Let's say there's a topic in which chunks of different files are allmixed up represented by a tuple |(FileId, Chunk)|.
Chunks of a same file also can be a little out of order.

The task is to aggregate all files and store them into some store.

The number of files is unbound.

In pseudo stream DSL that might look like
|topic('chunks') .groupByKey((fileId, chunk) -> fileId) .sortBy((fileId,chunk) -> chunk.offset) .aggregate((fileId, chunk) ->store.append(fileId, chunk)); |
I want to understand whether kafka streams can solve this efficiently.Since the number of files is unbound how would kafka manage intermediatetopics for groupBy operation? How many partitions will it use etc? Can'tfind this details in the docs. Also let's say chunk has a flag thatindicates EOF. How to indicate that specific group will no longer haveany new data?
That’s a copy of my stack overflow question.
[email protected]
What does kafka streams groupBy does internally?<https://stackoverflow.com/questions/77870807/what-does-kafka-streams-groupby-does-internally>stackoverflow.com<https://stackoverflow.com/questions/77870807/what-does-kafka-streams-groupby-does-internally>
<https://stackoverflow.com/questions/77870807/what-does-kafka-streams-groupby-does-internally>


—
Michael

Re: What does kafka streams groupBy does internally?

Reply via email to