Hi, We are trying to implement the following use case: We have a stream of DataX events that arrive every 5 minutes and require some processing. Each event holds data for a specific non-unique ID (we keep getting updated data for each ID). There might be up to 1,000,000 IDs. In addition, there is a stream of DataY events for some of these IDs, that arrive in a variable frequency. Could be after a minute and then again after 5 hours. We would like to join the current DataX and latest DataY events by ID (and process only IDs that have both DataX and DataY events).
We thought of holding a state of DataY events per ID in a global window, and then use it as a side input for filtering the DataX events stream. The state should hold the latest (by timestamp) DataY event that arrived. The problem is: if we are using discardingFiredPanes(), then each DataY event is fired only once and cannot be reused later on for filtering. On the other hand, if we are using accumulatingFiredPanes(), then a list of all DataY events that arrived is fired. Are we missing something? what is the best practice for combining two streams, one with a variable frequency? Thanks, Ifat