Re: Triggers based on size

Robert Bradshaw Wed, 10 Jan 2018 01:14:15 -0800

Unfortunately, the metadata driven trigger is still just an idea, not
yet implemented.


A good introduction to state and timers can be found at
https://beam.apache.org/blog/2017/08/28/timely-processing.html

On Wed, Jan 10, 2018 at 1:08 AM, Carlos Alonso <car...@mrcalonso.com> wrote:
> Hi Robert, Kenneth.
>
> Thanks a lot to both of you for your responses!!
>
> Kenneth, unfortunately I'm not sure we're experienced enough with Apache
> Beam to get anywhere close to your suggestion, but thanks anyway!!
>
> Robert, your suggestion sounds great to me, could you please provide any
> example on how to use that 'metadata driven' trigger?
>
> Thanks!
>
> On Tue, Jan 9, 2018 at 9:11 PM Kenneth Knowles <k...@google.com> wrote:
>>
>> Often, when you need or want more control than triggers provide, such as
>> input-type-specific logic like yours, you can use state and timers in ParDo
>> to control when to output. You lose any potential optimizations of Combine
>> based on associativity/commutativity and assume the burden of making sure
>> your output is sensible, but dropping to low-level stateful computation may
>> be your best bet.
>>
>> Kenn
>>
>> On Tue, Jan 9, 2018 at 11:59 AM, Robert Bradshaw <rober...@google.com>
>> wrote:
>>>
>>> We've tossed around the idea of "metadata-driven" triggers which would
>>> essentially let you provide a mapping element -> metadata and a
>>> monotonic CombineFn metadata* -> bool that would allow for this (the
>>> AfterCount being a special case of this, with the mapping fn being _
>>> -> 1, and the CombineFn being sum(...) >= N, for size one would
>>> provide a (perhaps approximate) sizing mapping fn).
>>>
>>> Note, however, that there's no guarantee that the trigger fire as soon
>>> as possible; due to runtime characteristics a significant amount of
>>> data may be buffered (or come in at once) before the trigger is
>>> queried. One possibility would be to follow your triggering with a
>>> DoFn that breaks up large value streams into multiple manageable sized
>>> ones as needed.
>>>
>>> On Tue, Jan 9, 2018 at 11:43 AM, Carlos Alonso <car...@mrcalonso.com>
>>> wrote:
>>> > Hi everyone!!
>>> >
>>> > I was wondering if there is an option to trigger window panes based on
>>> > the
>>> > size of the pane itself (rather than the number of elements).
>>> >
>>> > To provide a little bit more of context we're backing up a PubSub topic
>>> > into
>>> > GCS with the "special" feature that, depending on the "type" of the
>>> > message,
>>> > the GCS destination is one or another.
>>> >
>>> > Messages' 'shape' published there is quite random, some of them are
>>> > very
>>> > frequent and small, some others very big but sparse... We have around
>>> > 150
>>> > messages per second (in total) and we're firing every 15 minutes and
>>> > experiencing OOM errors, we've considered firing based on the number of
>>> > items as well, but given the randomness of the input, I don't think it
>>> > will
>>> > be a final solution either.
>>> >
>>> > Having a trigger based on size would be great, another option would be
>>> > to
>>> > have a dynamic shards number for the PTransform that actually writes
>>> > the
>>> > files.
>>> >
>>> > What is your recommendation for this use case?
>>> >
>>> > Thanks!!
>>
>>
>

Re: Triggers based on size

Reply via email to