Hi Nhan,

To get a global view over all events, you can use a non-keyed TumblingWindow and a ProcessAllWindowFunction.

Inside the ProcessAllWindowFunction, you calculate the min/max/count of the elements of that window,

compared them to the existing values in the global state, then update the new min/max/count to global state to use in the next window.

You can also get the min/max/count downstream by emitting it together with the window's item.


Do note that non-keyed Window always run with a parallelism of 1, so it might create a hotspot/bottleneck in your stream.


Regards,

Kien


On 1/25/2019 3:17 PM, Thanh-Nhan Vo wrote:

Hi Kien,

Thank you for your answer.

Please correct me if I’m wrong. If I understand well, if I store the max/min value using the value states of a KeyedProcessFunction, this max/min value is calculated  per key?

Note that in my case, I expect that at every instant,  I can obtain the maximum/minimum number of processed messages for all keys. For example:

-Input datastream : [ message1(k1, v1) messages2(k2,v2)  message3(k1, v3)  message4(k4, v4) message5(k1, v5) message6(k2, v6)  message7(k7, v7)]

-When processing message7(k7, v7), I expect to obtain:

oMaximum number of processed messages: 3 (corresponding to key k1)

oMinimum number of processed messages: 1 (corresponding to keys 4 and 7)

Do you have any idea to obtain this, please?

Thank you so much !

Nhan

*De :* Kien Truong [mailto:duckientru...@gmail.com]
*Envoyé :* jeudi 24 janvier 2019 12:45
*À :* Thanh-Nhan Vo <thanh-nhan...@bleckwen.ai>; user@flink.apache.org
*Objet :* Re: [Flink 1.6] How to get current total number of processed events

Hi Nhan,

You can store the max/min value using the value states of a KeyedProcessFunction,

or in the global state of a ProcessWindowFunction.

On processing each item, compare its value to the current max/min and update the stored value as needed.

Regards,

Kien

On 1/24/2019 12:37 AM, Thanh-Nhan Vo wrote:

    Hi Kien Truong,


    Thank you for your answer. I have another question, please !
    If I count the number of messages processed for a given key j
    (denoted c_j), is there a way to retrieve max{c_j}, min{c_j}?

    Thanks

    *De :* Kien Truong [mailto:duckientru...@gmail.com]
    *Envoyé :* mercredi 23 janvier 2019 16:04
    *À :* user@flink.apache.org <mailto:user@flink.apache.org>
    *Objet :* Re: [Flink 1.6] How to get current total number of
    processed events

    Hi Nhan,

    Logically, the total number of processed events before an event
    cannot be accurately calculated unless events processing are
    synchronized.

    This is not scalable, so naturally I don't think Flink supports it.

    Although, I suppose you can get an approximate count by using a
    non-keyed TumblingWindow, count the item inside the window, then
    use that value in the next window.

    Regards,

    Kien

    On 1/21/2019 9:34 PM, Thanh-Nhan Vo wrote:

        Hello all,

        I have a question, please !
        I’m using Flink 1.6 to process our data in streaming mode.
        I wonder if at a given event, there is a way to get the
        current total number of processed events (before this event).

        If possible, I want to get this total number of processed
        events as a value state in Keystream.
        It means that for a given key in KeyStream, I want to retrieve
        not only the total number of processed events for this key but
        also the total number of processed events for all keys.

        There is a way to do this in Flink 1.6, please!

        Best regard,
        Nhan

Reply via email to