Hi all,

First post here, so please be kind :)

Firstly some context; I have the following high-level job topology:

(1) FlinkPulsarSource -> (2) RichAsyncFunction -> (3) SinkFunction

1. The FlinkPulsarSource reads event notifications about article updates from a 
Pulsar topic
2. The RichAsyncFunction fetches the “full” article from the specified URL 
end-point, and transmutes it into a “legacy” article format
3. The SinkFunction writes the “legacy” article to a (legacy) web platform i.e. 
the sink is effectively another web site

I have this all up and running (despite lots of shading fun).

When the SinkFunction creates an article on the legacy platform it returns an 
'HTTP 201 - Created’ with a Location header suitably populated.

Now, I need to persist that Location header and, more explicitly, need to 
persist a map between the URLs for the new and legacy platforms.  This is 
needed for latter update and delete processing.

The question is how do I store this mapping information?

I’ve spent some time trying to grok state management and the various backends, 
but from what I can see the state management is focused on “operator scoped” 
state.  This seems reasonable given the requirement for barriers etc to ensure 
accurate recovery.

However, I need some persistence between operators (shared state?) and with 
longevity beyond the processing of an operator.

My gut reaction is that I need an external K-V store such as Ignite (or 
similar). Frankly given that Flink ships with embedded RocksDB I was hoping to 
use that, but there seems no obvious way to do this, and lots of advice saying 
don’t :)

Am I missing something obvious here?

Many thanks in advance

Paul


Reply via email to