Often, when you need or want more control than triggers provide, such as
input-type-specific logic like yours, you can use state and timers in ParDo
to control when to output. You lose any potential optimizations of Combine
based on associativity/commutativity and assume the burden of making sure
your output is sensible, but dropping to low-level stateful computation may
be your best bet.

Kenn

On Tue, Jan 9, 2018 at 11:59 AM, Robert Bradshaw <[email protected]>
wrote:

> We've tossed around the idea of "metadata-driven" triggers which would
> essentially let you provide a mapping element -> metadata and a
> monotonic CombineFn metadata* -> bool that would allow for this (the
> AfterCount being a special case of this, with the mapping fn being _
> -> 1, and the CombineFn being sum(...) >= N, for size one would
> provide a (perhaps approximate) sizing mapping fn).
>
> Note, however, that there's no guarantee that the trigger fire as soon
> as possible; due to runtime characteristics a significant amount of
> data may be buffered (or come in at once) before the trigger is
> queried. One possibility would be to follow your triggering with a
> DoFn that breaks up large value streams into multiple manageable sized
> ones as needed.
>
> On Tue, Jan 9, 2018 at 11:43 AM, Carlos Alonso <[email protected]>
> wrote:
> > Hi everyone!!
> >
> > I was wondering if there is an option to trigger window panes based on
> the
> > size of the pane itself (rather than the number of elements).
> >
> > To provide a little bit more of context we're backing up a PubSub topic
> into
> > GCS with the "special" feature that, depending on the "type" of the
> message,
> > the GCS destination is one or another.
> >
> > Messages' 'shape' published there is quite random, some of them are very
> > frequent and small, some others very big but sparse... We have around 150
> > messages per second (in total) and we're firing every 15 minutes and
> > experiencing OOM errors, we've considered firing based on the number of
> > items as well, but given the randomness of the input, I don't think it
> will
> > be a final solution either.
> >
> > Having a trigger based on size would be great, another option would be to
> > have a dynamic shards number for the PTransform that actually writes the
> > files.
> >
> > What is your recommendation for this use case?
> >
> > Thanks!!
>

Reply via email to