Unf this notion isn't applicable: "...At the end of a time window..."
If you comb through the archives of this group you'll see many questions about notifications for the 'end of an aggregation window' and a similar number of replies from the Kafka group stating that such a notion doesn't really exist. Each window is kept open so that late arriving records can be incorporated. You can specify the lifetime of a given window but you don't get any sort of signal when it expires. A record that arrives after said expiration will trigger a new window to be created. On Wed, Jul 12, 2017 at 5:06 PM, Stephen Powis <spo...@salesforce.com> wrote: > Hey! I was hoping I could get some input from people more experienced with > Kafka Streams to determine if they'd be a good use case/solution for me. > > I have multi-tenant clients submitting data to a Kafka topic that they want > ETL'd to a third party service. I'd like to batch and group these by > tenant over a time window, somewhere between 1 and 5 minutes. At the end > of a time window then issue an API request to the third party service for > each tenant sending the batch of data over. > > Other points of note: > - Ideally we'd have exactly-once semantics, sending data multiple times > would typically be bad. But we'd need to gracefully handle things like API > request errors / service outages. > > - We currently use Storm for doing stream processing, but the long running > time-windows and potentially large amount of data stored in memory make me > a bit nervous to use it for this. > > Thoughts? Thanks in Advance! > Stephen >