Greetings, I'm trying to do following:
I'm streaming messages from PubSub with encoded json which contain DB fields like "primary key" as well as "operation type", which can have values: insert, update, delete to reflect db operations. Goal is to produce one file per day (with 30 minute updates) which will contain final state of rows/messages. 30 minute file update/flush should reflect date state at that time (taking into considerations operations). Lets say in first 30 minutes there is insert operation for some row which will be in first flush of output file, but in next 30 minutes comes a message with delete operation then in second file flush it shouldn't be in output file etc. For the next day, new file should be created. I'm just wondering if something like this can be done in Beam (Java). I had in mind doing fixed 24h window (which I guess is not best practice) which would trigger after 30 minutes and having group by key operation (by primary key) and then handle operations, but it looks like triggering can't be done that way, or maybe my whole approach/understanding is wrong... Anyway I would be grateful for any advice how this can be done (conceptually) or if there is some code sample of similar pipeline. Best regards, Zdenko -- _______________________ http://www.the-swamp.info
