Hi Lu & Ken, Flink is a stream processing engine (albeit stateful) that doesn't aim to serve queries directly. When it comes to serving systems, AFAIK, has two campuses of user requirements.
- the one that runs a really simple query (single indexing, like dynamo) serving a large number of reads/updates. - the one that runs a complex query (per column indexing, like pinot/druid) serving a small number of reads and small updates. Given the segmented nature of serving systems, Flink would be best to ingest insert/update of states dim to query to serving systems that fit. One of the ideas is to emit Flink states CDC (e.g add/remove/update) of a Flink state to side output. Where a certain conditional update to serving systems could be implemented to be able to handle restarts of the Flink job. Chen On Mon, Aug 29, 2022 at 7:15 PM Ken Krugler <kkrugler_li...@transpac.com> wrote: > Hi Lu, > > It would be helpful to know about your query requirements, before making a > recommendation. > > E.g. does it just need to be a key-value store, and thus you’re querying > by a single key (which has to match the state partitioning key)? > > What about latency requirements? E.g. if you’re processing Flink state > (option 3) then this is going to be large. > > As a final take-away, in my experience I’ve always wound up shoving data > into a separate system (Pinot is my current favorite) for queries. > > — Ken > > > On Aug 29, 2022, at 3:19 PM, Lu Niu <qqib...@gmail.com> wrote: > > Hi, Flink Users > > We have a user case that requests running ad hoc queries to query flink > state. There are several options: > > 1. Dump flink state to external data systems, like kafka, s3 etc. from > there we can query the data. This is a very straightforward approach, but > adds system complexity and overall cost. > 2. Flink Queryable State. This requires additional development and also > when the job is down, we can not query the data, which violates the need > for debugging in the first place. Last, from some channel I happen to know > this feature is on the deprecation list. > 3. Flink State API. This requires additional development. > > I am wondering what are some best practices applied in production. For me, > I really hope there is one product that 1. let me query the flink state > using SQL 2. decouple with flink job > > Best > Lu > > > > -------------------------- > Ken Krugler > http://www.scaleunlimited.com > Custom big data solutions > Flink, Pinot, Solr, Elasticsearch > > > >