Re: Best Practice for Querying Flink State

Chen Qin Mon, 29 Aug 2022 20:53:47 -0700

Hi Lu & Ken,

Flink is a stream processing engine (albeit stateful) that doesn't aim to
serve queries directly.
When it comes to serving systems, AFAIK,  has two campuses of user
requirements.


- the one that runs a really simple query (single indexing, like dynamo)
serving a large number of reads/updates.
- the one that runs a complex query (per column indexing, like pinot/druid)
serving a small number of reads and small updates.

Given the segmented nature of serving systems, Flink would be best to
ingest insert/update of states dim to query to serving systems that fit.

One of the ideas is to emit Flink states CDC (e.g add/remove/update) of a
Flink state to side output. Where a certain conditional update to serving
systems could be implemented to be able to handle restarts of the Flink job.

Chen


On Mon, Aug 29, 2022 at 7:15 PM Ken Krugler <kkrugler_li...@transpac.com>
wrote:

> Hi Lu,
>
> It would be helpful to know about your query requirements, before making a
> recommendation.
>
> E.g. does it just need to be a key-value store, and thus you’re querying
> by a single key (which has to match the state partitioning key)?
>
> What about latency requirements? E.g. if you’re processing Flink state
> (option 3) then this is going to be large.
>
> As a final take-away, in my experience I’ve always wound up shoving data
> into a separate system (Pinot is my current favorite) for queries.
>
> — Ken
>
>
> On Aug 29, 2022, at 3:19 PM, Lu Niu <qqib...@gmail.com> wrote:
>
> Hi, Flink Users
>
> We have a user case that requests running ad hoc queries to query flink
> state. There are several options:
>
> 1. Dump flink state to external data systems, like kafka, s3 etc. from
> there we can query the data. This is a very straightforward approach, but
> adds system complexity and overall cost.
> 2. Flink Queryable State. This requires additional development and also
> when the job is down, we can not query the data, which violates the need
> for debugging in the first place. Last, from some channel I happen to know
> this feature is on the deprecation list.
> 3. Flink State API. This requires additional development.
>
> I am wondering what are some best practices applied in production. For me,
> I really hope there is one product that 1. let me query the flink state
> using SQL 2. decouple with flink job
>
> Best
> Lu
>
>
>
> --------------------------
> Ken Krugler
> http://www.scaleunlimited.com
> Custom big data solutions
> Flink, Pinot, Solr, Elasticsearch
>
>
>
>

Re: Best Practice for Querying Flink State

Reply via email to