Kafka transactioins & flink checkpoints

2022-11-15 Thread Vishal Surana
I wanted to achieve exactly once semantics in my job and wanted to make sure I understood the current behaviour correctly: 1. Only one Kafka transaction at a time (no concurrent checkpoints) 2. Only one transaction per checkpoint My job has very large amount of state (>100GB) and I have no opti

Missing image tag in apache/flink repository ?

2022-11-15 Thread Alon Halimi via user
Hello :) It seems the tag "apache/flink:1.16.0-scala_2.12" is missing - I get the following error: failed to pull and unpack image "docker.io/apache/flink:1.16.0-scala_2.12" note that: (1) /apache/flink:1.16.0-scala_2.12 (without the 0 version suffix ) does exist (2) /flink:1.16.0-scala_2.12 (

Re: Converting ResolvedSchema to JSON and Protobuf Schemas

2022-11-15 Thread Theodor Wübker
Yes, you are right. Schemas are not so nice in Json. When implementing and testing my converter from DataType to JsonSchema I noticed that your converter from JsonSchema to DataType converts number to double always. I wonder, did you make this up? Because the table that specifies the mapping

Re: [blog article] Howto migrate a real-life batch pipeline from the DataSet API to the DataStream API

2022-11-15 Thread Etienne Chauchot
Hi, Any feedback on the interest of the API benchmark article below ? Best Etienne Le 09/11/2022 à 12:18, Etienne Chauchot a écrit : Hi, And by the way, I was planing on writing another article to compare the performances of DataSet, DataStream and SQL APIs over TPCDS query3. I thought th

[SUMMARY] Flink 1.17 Release Sync 11/15/2022

2022-11-15 Thread Leonard Xu
Hi devs and users, I’d like to share some highlights about the 1.17 release sync on 11/15/2022. - Release tracking page: - The community has collected some great features on the 1.17 page[1] - @committers Please continuously update the page in the coming week - JIRA account apply

Re: Missing image tag in apache/flink repository ?

2022-11-15 Thread godfrey he
Thanks for reporting this, I will resolve it ASAP. Best, Godfrey Alon Halimi via user 于2022年11月15日周二 16:46写道: > > Hello :) > > > > It seems the tag “apache/flink:1.16.0-scala_2.12” is missing – I get the > following error: > > > > failed to pull and unpack image "docker.io/apache/flink:1.16.0-s

Any way to improve list state get performance

2022-11-15 Thread tao xiao
Hi team, I have a Flink job that joins two streams, let's say A and B streams, followed by a key process function. In the key process function the job inserts elements from B stream to a list state if element from A stream hasn't arrived yet. I am wondering if any way to skip the liststat.get() to

Re: Converting ResolvedSchema to JSON and Protobuf Schemas

2022-11-15 Thread Andrew Otto
> Also thanks for showing me your pattern with the SchemaConversions and stuff. Feels pretty clean and worked like a charm :) Glad to hear it, that is very cool! > converts number to double always. I wonder, did you make this up? Yes, we chose the the mapping. We chose to do number -> double and

Re: Converting ResolvedSchema to JSON and Protobuf Schemas

2022-11-15 Thread Andrew Otto
> meaning that double and integer I meant to write: "meaning that double and bigint ... " :) On Tue, Nov 15, 2022 at 8:54 AM Andrew Otto wrote: > > Also thanks for showing me your pattern with the SchemaConversions and > stuff. Feels pretty clean and worked like a charm :) > Glad to hear it, tha

Kafka transactions drastically limit usability of Flink savepoints

2022-11-15 Thread Yordan Pavlov
Hi, we are using Kafka savepoints as a recovery tool and want to store multiple ones for the past months. However as we use Kafka transactions for our KafkaSink this puts expiration time on our savepoints. We can use a savepoint only as old as our Kafka transaction timeout. The problem is explained

Re: Kafka transactioins & flink checkpoints

2022-11-15 Thread Yaroslav Tkachenko
Hi Vishal, Just wanted to comment on this bit: > My job has very large amount of state (>100GB) and I have no option but to use unaligned checkpoints. I successfully ran Flink jobs with 10+ TB of state and no unaligned checkpoints enabled. Usually, you consider enabling them when there is some k

Re: Kafka transactions drastically limit usability of Flink savepoints

2022-11-15 Thread Piotr Nowojski
Hi Yordan, I don't understand where the problem is, why do you think savepoints are unusable? If you recover with `ignoreFailuresAfterTransactionTimeout` enabled, the current Flink behaviour shouldn't cause any problems (except for maybe some logged errors). Best, Piotrek wt., 15 lis 2022 o 15:3

Reading Parquet file with array of structs cause error

2022-11-15 Thread Benenson, Michael via user
Hi, folks I’m using flink 1.16.0, and I would like to read Parquet file (attached), that has schema [1]. I could read this file with Spark, but when I try to read it with Flink 1.16.0 (program attached) using schema [2] I got IndexOutOfBoundsException [3] My code, and parquet file are attache

Re: Reading Parquet file with array of structs cause error

2022-11-15 Thread Jing Ge
Hi Michael, Currently, ParquetColumnarRowInputFormat does not support schemas with nested columns. If your parquet file stores Avro records. You might want to try e.g. Avro Generic record[1]. [1] https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/connectors/datastream/formats/parquet

Re: Reading Parquet file with array of structs cause error

2022-11-15 Thread Benenson, Michael via user
Thanks, Jing Do you know, if this problem will be addressed in FLINK-28867 or it deserve a separate Jira? From: Jing Ge Date: Tuesday, November 15, 2022 at 3:39 PM To: Benenson, Michael Cc: user@flink.apache.org , Deshpande, Omkar , Vora,

Re: Reading Parquet file with array of structs cause error

2022-11-15 Thread liu ron
It will be addressed in FLINK-28867. Best, Ron Benenson, Michael via user 于2022年11月16日周三 08:47写道: > Thanks, Jing > > > > Do you know, if this problem will be addressed in FLINK-28867 > or it deserve a > separate Jira? > > > > > > *From: *Jing