Re: What is the best way to have a cache of an external database in Flink?

David Anderson Fri, 22 Jan 2021 12:52:26 -0800

I provided an answer on stackoverflow, where I said the following:

A few different mechanisms in Flink may be relevant to this use case,
depending on your detailed requirements.

*Broadcast State*

Jaya Ananthram <https://stackoverflow.com/users/2936216/jaya-ananthram> has
already covered the idea of using broadcast state in his answer
<https://stackoverflow.com/a/65848580/2000823>. This makes sense if the
rules should be applied globally, for every key, and if you can find a way
to collect and broadcast the updates.

Note that the Context in the processBroadcastElement() of a
KeyedBroadcastProcessFunction method contains the method
applyToKeyedState(StateDescriptor<S,
VS> stateDescriptor, KeyedStateFunction<KS, S> function). This means you
can register a KeyedStateFunction that will be applied to all states of all
keys associated with the provided stateDescriptor.

*State Processor API*

If you want to bootstrap state in a Flink savepoint from a database dump,
you can do that with this library. You'll find a simple example of using
the State Processor API to bootstrap state in this gist
<https://gist.github.com/alpinegizmo/ff3d2e748287853c88f21259830b29cf>.

*Change Data Capture*

The Table/SQL API supports Debezium
<https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/connectors/formats/debezium.html>
, Canal
<https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/connectors/formats/canal.html>,
and Maxwell
<https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/connectors/formats/maxwell.html>
CDC
streams, and Kafka upsert streams
<https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/connectors/upsert-kafka.html>.
This may be a solution. There's also flink-cdc-connectors
<https://github.com/ververica/flink-cdc-connectors>.

*Lookup Joins*

Flink SQL can do temporal lookup joins against a JDBC database, with a
configurable
cache
<https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/connectors/jdbc.html#lookup-cache>.
Not sure this is relevant.

On Fri, Jan 22, 2021 at 7:30 PM Jan Oelschlegel <
oelschle...@integration-factory.de> wrote:

> But then you need a way to consume a database as a DataStream.
>
>
>
> I found this one https://github.com/ververica/flink-cdc-connectors.
>
>
>
> I want to implement a similar use case, but I don’t know how to parse the
> SourceRecord (which comes from the connector) into an PoJo for further
> processing.
>
>
>
> Best,
>
> Jan
>
>
>
> *Von:* Selvaraj chennappan <selvarajchennap...@gmail.com>
> *Gesendet:* Freitag, 22. Januar 2021 18:09
> *An:* Kumar Bolar, Harshith <hk...@arity.com>
> *Cc:* user <user@flink.apache.org>
> *Betreff:* Re: What is the best way to have a cache of an external
> database in Flink?
>
>
>
> Hi,
>
> Perhaps  broadcast state is natural fit for this scenario.
>
>
> https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/stream/state/broadcast_state.html
>
>
>
>
>
> Thanks,
>
> Selvaraj C
>
>
>
> On Fri, 22 Jan 2021 at 8:45 PM, Kumar Bolar, Harshith <hk...@arity.com>
> wrote:
>
> Hi all,
>
> The external database consists of a set of rules for each key, these rules
> should be applied on each stream element in the Flink job. Because it is
> very expensive to make a DB call for each element and retrieve the rules, I
> want to fetch the rules from the database at initialization and store it in
> a local cache.
>
> When rules are updated in the external database, a status change event is
> published to the Flink job which should be used to fetch the rules and
> refresh this cache.
>
> What is the best way to achieve what I've described? I looked into keyed
> state but initializing all keys and refreshing the keys on update doesn't
> seem possible.
>
> Thanks,
>
> Harshith
>
> --
>
>
>
>
>
>
>
>
>
>
>
> Regards,
> Selvaraj C
> HINWEIS: Dies ist eine vertrauliche Nachricht und nur für den Adressaten
> bestimmt. Es ist nicht erlaubt, diese Nachricht zu kopieren oder Dritten
> zugänglich zu machen. Sollten Sie diese Nachricht irrtümlich erhalten
> haben, bitte ich um Ihre Mitteilung per E-Mail oder unter der oben
> angegebenen Telefonnummer.
>

Re: What is the best way to have a cache of an external database in Flink?

Reply via email to