Hello,

This is what we did, but i'm not quite convinced that its the best way (maybe others could chime in ?).

 * We have a zalando postgres cluster running next to the flink
   cluster, so we can just use a jdbc sink for the state. In theory we
   should be able to switch to exactly once (we didn't do this so far)
 * our stateful processor is a state machine that emits outgoing
   messages based on incoming messages. Sometimes we need to "rewind"
   the state machine to correctly process an incoming message. This
   forces us to keep some history of past messages
 * We don't materialize the state directly, we only materialize the
   state changes, which are then re-materialized in postgres. It took
   us some time to make this bug-free. When we were still debugging
   this, we read a savepoint to look in the state and compare it with
   what we had in postgres.

In a zalando postgres cluster you can only write to the master. But for readers, if a small delay is acceptable, you can load balance to the replica's.

Greetings,

Frank


On 12.02.22 16:56, Jatti, Karthik wrote:

Hi Frank,

What sink did you end up choosing for materializing the state ?

Our use case into looking at queryable state is that we have many readers and a very few writers (readers to writers ratio in the 1000s). Each consuming application (reader) needs a live view of a subset of the state and these applications come online and go offline many times a day. What would be a good sink in such a scenario ?

e.g if the state of the flink app was a dynamic table of inventory of products built from Kafka streams of purchases and sales. And a subset of this state needs to be available for 1000s of readers who have a live view of what is available in stock with different aggregations and filters . And these consumers come online and go offline, so they need to be able to restore their substate and continue to receive updates for it.

We are evaluating sinks but haven’t narrowed on anything that would look like an obvious case.

Thanks,

Karthik

*From: *Jatti, Karthik <kja...@ezesoft.com>
*Date: *Friday, February 11, 2022 at 6:00 PM
*To: *Frank Dekervel <fr...@kapernikov.com>, user@flink.apache.org <user@flink.apache.org>, dwysakow...@apache.org <dwysakow...@apache.org>
*Subject: *Re: Queryable State Deprecation

Thank you Frank and Dawid for providing the context here.

*From: *Frank Dekervel <fr...@kapernikov.com>
*Date: *Friday, February 4, 2022 at 9:56 AM
*To: *user@flink.apache.org <user@flink.apache.org>
*Subject: *Re: Queryable State Deprecation

*EXTERNAL SENDER*

------------------------------------------------------------------------

Hello,

To give an extra datapoint: after a not so successful experiment with faust-streaming we moved our application to flink. Since flinks queryable state was apparently stagnant, we implemented what was needed to sink the state to an external data store for querying.

However, if queryable state was in good shape we would definately have used it. Making sure that the state is always reflected correctly in our external system turned out to be non-trivial for a number of reasons: our state is not trivially convertable to rows in a table, and sometimes we had (due to our own bugs, but still) inconsistencies between the internal flink state and the externally materialized state, especially after replaying from a checkpoint/savepoint after a crash (we cannot use exactly_once sinks in all occasions).

Also, obviously, we could not use flinks partitioning/parallellism to help making state querying more scalable.

Greetings,
Frank

On 04.02.22 14:06, Dawid Wysakowicz wrote:

    Hi Karthik,

    The reason we deprecated it is because we lacked committers who
    could spend time on getting the Queryable state to a production
    ready state. I might be speaking for myself here, but I think the
    main use case for the queryable state is to have an insight into
    the current state of the application for debugging purposes. If it
    is used for data serving purposes, we believe it's better to sink
    the data into an external store, which can provide better
    discoverability and more user friendly APIs for querying the results.

    As for debugging/tracking insights you may try to achieve similar
    results with metrics.

    Best,

    Dawid

    On 01/02/2022 16:36, Jatti, Karthik wrote:

        Hi,

        I see on the Flink Roadmap that Queryable state API is
        scheduled to be deprecated but I couldn’t find much
        information on confluence or this mailing group’s archives to
        understand the background as to why it’s being deprecated and
        what would be a an alternative.  Any pointers to help me get
        some more information here would be great.

        Thanks,

        Karthik

        ------------------------------------------------------------------------


        The information in the email message containing a link to this
        page, including any attachments thereto (collectively, “the
        e-mail”), is only for use by the intended recipient(s). The
        e-mail may contain information that is confidential,
        proprietary and/or privileged. If you have reason to believe
        that you are not the intended recipient, please notify the
        sender that you may have received this e-mail in error and
        delete all copies of it, including attachments, from your
        computer. Any viewing, copying, disclosure or distribution of
        this information by an unintended recipient is prohibited and
        by an intended recipient may be governed by arrangements in
        place between the sender’s and recipient’s respective firms.
        Eze Software does not represent that the e-mail is virus-free,
        complete or accurate. Eze Software accepts no liability for
        any damage sustained in connection with the content or
        transmission of the e-mail.

        ------------------------------------------------------------------------

        Copyright © 2013 Eze Castle Software LLC. All Rights Reserved.

Reply via email to