Re: How to write custom serializer for dynamodb connector

2022-11-09 Thread Danny Cranmer
Hey Matt, Thanks for the feedback, I have updated the SinkIntoDynamoDb [1] sample to avoid this in future. We have recently added support for @DynamoDbBean annotated POJOs which you might find interesting. This removes the need to create a custom ElementConverter all together, see SinkDynamoDbBean

Re: [blog article] Howto migrate a real-life batch pipeline from the DataSet API to the DataStream API

2022-11-09 Thread Etienne Chauchot
Hi Yun Gao, thanks for your email and your review ! My comments are inline Le 08/11/2022 à 06:51, Yun Gao a écrit : Hi Etienne, Very thanks for the article! Flink is currently indeed keeping increasing the ability of unified batch / stream processing with the same api, and its a great pleas

Re: [blog article] Howto migrate a real-life batch pipeline from the DataSet API to the DataStream API

2022-11-09 Thread Etienne Chauchot
Hi, And by the way, I was planing on writing another article to compare the performances of DataSet, DataStream and SQL APIs over TPCDS query3. I thought that I could run the pipelines on an Amazon EMR cluster with different data sizes 1GB, 100GB, 1TB. Would it be worth it, what do you think

Re: Converting ResolvedSchema to JSON and Protobuf Schemas

2022-11-09 Thread Andrew Otto
Hello! I see you are talking about JSONSchema, not just JSON itself. We're trying to do a similar thing at Wikimedia and have developed some tooling around this. JsonSchemaFlinkConverter

Re: Converting ResolvedSchema to JSON and Protobuf Schemas

2022-11-09 Thread Andrew Otto
> I want to convert the schema of a Flink table to both Protobuf *schema* and JSON *schema* Oh, you want to convert from Flink Schema TO JSONSchema? Interesting. That would indeed be something that is not usually done. Just curious, why do you want to do this? On Wed, Nov 9, 2022 at 8:46 AM And

Re: Converting ResolvedSchema to JSON and Protobuf Schemas

2022-11-09 Thread Theodor Wübker
Hey, thank you for your reply. Your converter looks very interesting. However, Flink comes with the JsonRowSchemaConverter that converts a JSONSchema-String to a TypeInformation already. From there you can convert the TypeInformation to, say, a DataType (Although I must admit I only got this do

Re: Converting ResolvedSchema to JSON and Protobuf Schemas

2022-11-09 Thread Theodor Wübker
I want to register the result-schema in a schema registry, as I am pushing the result-data to a Kafka topic. The result-schema is not known at compile-time, so I need to find a way to compute it at runtime from the resulting Flink Schema. -Theo (resent - again sorry, I forgot to add the others

Re: [blog article] Howto migrate a real-life batch pipeline from the DataSet API to the DataStream API

2022-11-09 Thread Etienne Chauchot
Hi Yun Gao, FYI I just updated the article after your review: https://echauchot.blogspot.com/2022/11/flink-howto-migrate-real-life-batch.html Best Etienne Le 09/11/2022 à 10:04, Etienne Chauchot a écrit : Hi Yun Gao, thanks for your email and your review ! My comments are inline Le 08/1

Re: Converting ResolvedSchema to JSON and Protobuf Schemas

2022-11-09 Thread Andrew Otto
Interesting, yeah I think you'll have to implement code to recurse through the (Row) DataType and somehow auto generate the JSONSchema you want. We abstracted the conversions from JSONSchema to other type systems in this JsonSchemaConverter

Flink fails to authenticate with adlsgen2 azure storage account using managed identities in an Azure Kubernetes Cluster

2022-11-09 Thread DEROCCO, CHRISTOPHER
Flink fails to authenticate with adlsgen2 azure storage account using managed identities in an Azure Kubernetes Cluster. We receive the following error from flink when we try to configure managed identities to authenticate to adlsgen2. Caused by: Unable to load key provider class. at

RocksDB checkpoints clean up with TaskManager restart.

2022-11-09 Thread Vidya Sagar Mula
Hi, I am using RocksDB state backend for incremental checkpointing with Flink 1.11 version. Question: -- For a given Job ID, Intermediate RocksDB checkpoints are stored under the path defined with "" The files are stored with "_jobID+ radom UUID" prefixed to the location. Case : 1 -

Any caveats about processing abstract classes ?

2022-11-09 Thread Davide Bolcioni via user
Greetings, I am looking at Flink pipeline processing events consumed from a Kafka topic, which now needs to also consume events which have a different, but related, schema. Traditional Java OOP would suggest transitioning from class Dog { ... } new FilterFunction { ... } to abstract class Animal

Re: How to get checkpoint stats after job has terminated

2022-11-09 Thread yidan zhao
First of all, you should trigger a savepoint before stopping the job, and then you can restart the job with the savepoint. For checkpoints, you need to set ‘execution.checkpointing.externalized-checkpoint-retention’ to 'RETAIN_ON_CANCELLATION'. You can get the checkpoints info via history server.