Often, when you need or want more control than triggers provide, such as input-type-specific logic like yours, you can use state and timers in ParDo to control when to output. You lose any potential optimizations of Combine based on associativity/commutativity and assume the burden of making sure your output is sensible, but dropping to low-level stateful computation may be your best bet.
Kenn On Tue, Jan 9, 2018 at 11:59 AM, Robert Bradshaw <[email protected]> wrote: > We've tossed around the idea of "metadata-driven" triggers which would > essentially let you provide a mapping element -> metadata and a > monotonic CombineFn metadata* -> bool that would allow for this (the > AfterCount being a special case of this, with the mapping fn being _ > -> 1, and the CombineFn being sum(...) >= N, for size one would > provide a (perhaps approximate) sizing mapping fn). > > Note, however, that there's no guarantee that the trigger fire as soon > as possible; due to runtime characteristics a significant amount of > data may be buffered (or come in at once) before the trigger is > queried. One possibility would be to follow your triggering with a > DoFn that breaks up large value streams into multiple manageable sized > ones as needed. > > On Tue, Jan 9, 2018 at 11:43 AM, Carlos Alonso <[email protected]> > wrote: > > Hi everyone!! > > > > I was wondering if there is an option to trigger window panes based on > the > > size of the pane itself (rather than the number of elements). > > > > To provide a little bit more of context we're backing up a PubSub topic > into > > GCS with the "special" feature that, depending on the "type" of the > message, > > the GCS destination is one or another. > > > > Messages' 'shape' published there is quite random, some of them are very > > frequent and small, some others very big but sparse... We have around 150 > > messages per second (in total) and we're firing every 15 minutes and > > experiencing OOM errors, we've considered firing based on the number of > > items as well, but given the randomness of the input, I don't think it > will > > be a final solution either. > > > > Having a trigger based on size would be great, another option would be to > > have a dynamic shards number for the PTransform that actually writes the > > files. > > > > What is your recommendation for this use case? > > > > Thanks!! >
