The configuration might be lost in application mode when JobManager restarts

2022-08-29 Thread yu'an huang
Hi team, We found a case that the job configuration would be lost in application mode if the job manager restarted. When developing a job, users might want to set their configuration in the main method of their user program. This is fine for YARN per job mode. The client will run the user pro

How Flink knows that CREATE TABLE is sometimes about creating table, sometimes about creating file?

2022-08-29 Thread podunk
  To create table from file:   "CREATE TABLE Table1 (column_name1 STRING, column_name2 DOUBLE) WITH ('connector' = 'filesystem', 'path' = 'file:///C:/temp/test4.txt', 'format' = 'csv', 'csv.field-delimiter' = ';')");   To create file:   "CREATE TABLE Table1 (column_name1 STRING, column_name2

Re: How Flink knows that CREATE TABLE is sometimes about creating table, sometimes about creating file?

2022-08-29 Thread David Anderson
The role of CREATE TABLE is to provide the necessary metadata for the table -- the location of the data, its format, etc. Executing CREATE TABLE creates an entry in the catalog, but otherwise doesn't do anything. In a case like this one CREATE TABLE Table1 (column_name1 STRING, column_name2 DOUBL

Deadlock in Subtask in the FlinkKinesisConsumer

2022-08-29 Thread Seth Saperstein via user
Hi I wanted to bring awareness to this Jira describing a deadlock state we've experienced for a single subtask in the FlinkKinesisConsumer. This occurs when we've reached the following conditions in the subtask: - reached the max lookahead so th

Best Practice for Querying Flink State

2022-08-29 Thread Lu Niu
Hi, Flink Users We have a user case that requests running ad hoc queries to query flink state. There are several options: 1. Dump flink state to external data systems, like kafka, s3 etc. from there we can query the data. This is a very straightforward approach, but adds system complexity and ove

Kafka bounded source not completing

2022-08-29 Thread Vinod Mohanan via user
Hello, I am using Flink version:1.14.2 I have a pipeline in which I want to fetch data from kafka and write to S3. The fetch from kafka needs to be bound from timestamp t1 to timestamp t1+n. I intend to run this in batch mode, but a streaming pipeline which is scheduled to start on trigger and sto

Re: Best Practice for Querying Flink State

2022-08-29 Thread Ken Krugler
Hi Lu, It would be helpful to know about your query requirements, before making a recommendation. E.g. does it just need to be a key-value store, and thus you’re querying by a single key (which has to match the state partitioning key)? What about latency requirements? E.g. if you’re processing

Re: Best Practice for Querying Flink State

2022-08-29 Thread Chen Qin
Hi Lu & Ken, Flink is a stream processing engine (albeit stateful) that doesn't aim to serve queries directly. When it comes to serving systems, AFAIK, has two campuses of user requirements. - the one that runs a really simple query (single indexing, like dynamo) serving a large number of reads/