Hello there,

I've been experimenting with the Kafka Streams preview, and I'm excited
about its features and capabilities! My team is enthusiastic about the
lightweight operational profile, and the support for local state is very
compelling.

However, I'm having trouble designing a solution with KStreams to satisfy a
particular use-case: we want to "Sessionize" a stream of events, by
gathering together inputs that share a common identifier and occur without
a configurable interruption (gap) in event-time.

This is achievable with other streaming frameworks (eg, using
Beam/Dataflow's "Session" windows, or SparkStreaming's mapWithState with
its "timeout" capability), but I don't see how to approach it with the
current Kafka Streams API.

I've investigated using the aggregateWithKey function, but this doesn't
appear to support data-driven windowing. I've also considered using a
custom Processor to perform the aggregation, but don't see how to take an
output-stream from a Processor and continue to work with it. This area of
the system is undocumented, so I'm not sure how to proceed.

Am I missing something? Do you have any suggestions?

-josh

Reply via email to