Hi all, I'm currently investigating Beam for our company's needs, and I have a question about how to solve a specific windowing problem in python.
I have a stream of elements and I want to gather them until a special end-of-stream element arrives. To solve this, I've written a modified version of window.Sessions that takes an optional callable which is passed an element when it is being assigned to a window. If the callable returns True, the window is immediately closed. Here's the code: https://gist.github.com/chadrik/b0dfff8953fed99f99bdd69c7cc870ba It works as expected in the Direct runner, but fails in Dataflow, which I'm pretty sure is due to a serialization problem. So, I have a few questions: 1) Is there a way to do this in python using stock components? (i.e. without my custom class) 2) If not, is there any interest in accepting a PR to modify the stock window.Sessions to do something like what I've written? It seems very useful, and I've seen other people attempting to solve this same problem [1][2] 3) If not, how do I go about serializing my custom WindowFn class? From looking at the source code I'm certain I could figure out how to extend the serialization for a stock object like window.Sessions (modify standard_window_fns.proto, and update to/from_runner_api_parameter), but it's very unclear how I would do this for a custom WindowFn, since serialization of these classes seems to be part of the official beam gRPC portability protocol. Thanks in advance for your help! -chad 1. https://stackoverflow.com/questions/49011360/using-a-custom-windowfn-on-google-dataflow-which-closes-after-an-element-value 2. https://stackoverflow.com/questions/43035302/close-window-after-based-on-element-value/43052779#43052779
