Re: Best Practice for Querying Flink State

Ken Krugler Mon, 29 Aug 2022 19:15:49 -0700

Hi Lu,

It would be helpful to know about your query requirements, before making a 
recommendation.


E.g. does it just need to be a key-value store, and thus you’re querying by a 
single key (which has to match the state partitioning key)?

What about latency requirements? E.g. if you’re processing Flink state (option 
3) then this is going to be large.

As a final take-away, in my experience I’ve always wound up shoving data into a 
separate system (Pinot is my current favorite) for queries.

— Ken


> On Aug 29, 2022, at 3:19 PM, Lu Niu <qqib...@gmail.com> wrote:
> 
> Hi, Flink Users
> 
> We have a user case that requests running ad hoc queries to query flink 
> state. There are several options:
> 
> 1. Dump flink state to external data systems, like kafka, s3 etc. from there 
> we can query the data. This is a very straightforward approach, but adds 
> system complexity and overall cost. 
> 2. Flink Queryable State. This requires additional development and also when 
> the job is down, we can not query the data, which violates the need for 
> debugging in the first place. Last, from some channel I happen to know this 
> feature is on the deprecation list. 
> 3. Flink State API. This requires additional development. 
> 
> I am wondering what are some best practices applied in production. For me, I 
> really hope there is one product that 1. let me query the flink state using 
> SQL 2. decouple with flink job 
> 
> Best
> Lu
> 
> 

--------------------------
Ken Krugler
http://www.scaleunlimited.com
Custom big data solutions
Flink, Pinot, Solr, Elasticsearch

Re: Best Practice for Querying Flink State

Reply via email to