Hi Gordon,

If operating on checkpoints instead of savepoints this might be OK. But since 
this is not in the current scope I digged into Flink docs and found the 
"queryable state" 
(https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/stream/state/queryable_state.html#querying-state).

This sounds good and seems to be a possibility to read the state of a specific 
function by id. This would solve the first part of my challange (examining the 
current state). Additionally there is remote client what makes things easy.

As far as I understand its only necessary to enable this for statefuns. If the 
types like PersistedValue also takes a queryable-name like ValueStateDescriptor 
it could be passed through in places like 
https://github.com/apache/flink-statefun/blob/master/statefun-flink/statefun-flink-core/src/main/java/org/apache/flink/statefun/flink/core/state/FlinkState.java#L65.
 Then the state of single jobs could be retrieved if I'm right. But I can only 
query states of a specific statefun by id. Not the total crowd of states.

To get a solution in the "near" future I could send "state changes" egress 
messages and stream them into an ElasticSearch sink. Then I could search that 
ES index the way I like. I only have to check if that works in terms of amount 
of data and throughput. Additionally I'll have to consider how to structure 
those "state changes" events in the ES to be able to query as I need. As a 
give-away I would get historical data of states outdated or cleared.

This sounds like a feasible solution. What do you think?

Cheers,
Stephan


Von: Tzu-Li (Gordon) Tai <tzuli...@apache.org>
Gesendet: Donnerstag, 28. Jänner 2021 04:06
An: Stephan Pelikan <stephan.peli...@phactum.at>
Cc: user@flink.apache.org
Betreff: Re: Stateful Functions - accessing the state aside of normal processing

Hi Stephan,

Great to hear about your experience with StateFun so far!

I think what you are looking for is a way to read StateFun checkpoints, which 
are basically an immutable consistent point-in-time snapshot of all the states 
across all your functions, and run some computation or simply to explore the 
state values.
StateFun checkpoints are essentially adopted from Flink, so you can find more 
detail about that here [1].

Currently, StateFun does provide a means for state "bootstrapping": running a 
batch offline job to write and compose a StateFun checkpoint [2].
What is still missing is the "reading / analysis" side of things, to do exactly 
what you described: running a separate batch offline job for reading and 
processing an existing StateFun checkpoint.

Before we dive into details on how that may look like, do you think that is 
what you would need?

Although I don't think we would be able to support such a feature yet since 
we're currently focused on reworking the SDKs and request-reply protocol, in 
any case it would be interesting to discuss if this feature would be important 
for multiple users already.

Cheers,
Gordon

[1] 
https://ci.apache.org/projects/flink/flink-docs-master/concepts/stateful-stream-processing.html#checkpointing
[2] 
https://ci.apache.org/projects/flink/flink-statefun-docs-release-2.2/deployment-and-operations/state-bootstrap.html

On Wed, Jan 27, 2021 at 11:41 PM Stephan Pelikan 
<stephan.peli...@phactum.at<mailto:stephan.peli...@phactum.at>> wrote:
Hi,

We are trying to use Statefuns for our tool and it seems to be a good fit. I 
already adopted it and it works quite well. However, we have millions of 
different states (all the same FunctionType but different ids) and each state 
consists of several @Persisted values (values and tables). We want to build an 
administration tool for examining the crowd of states (count, histogram, etc.) 
and each state in detail (the persisted-tables and -values).

Additionally we need some kind of dig-down functionality for finding those 
individual states. For example some of those persisted values can be used to 
categorize the crowd of states.

My question now is how to achieve this. Is there a way to browse and examine 
statefuns in a read-only fashion (their ids, their persisted values)? How can 
one achieve this without duplicating status in e.g. a relational database?

Thanks,
Stephan

PS: I have another questions but I will send them in separate mails to avoid 
mixing up topics.

Reply via email to