StreamingFileSink with ParquetAvroWriters

2021-01-13 Thread Jan Oelschlegel
Hi, i'm using Flink (v.1.11.2) and would like to use the StreamingFileSink for writing into HDFS in Parquet format. As it says in the documentation I have added the dependencies: org.apache.flink flink-parquet_${scala.binary.version} ${flink.version} And this is my file sink definit

AW: StreamingFileSink with ParquetAvroWriters

2021-01-15 Thread Jan Oelschlegel
] https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/project-configuration.html#create-project Best, Jan Von: Dawid Wysakowicz Gesendet: Donnerstag, 14. Januar 2021 12:42 An: Jan Oelschlegel ; user@flink.apache.org Betreff: Re: StreamingFileSink with ParquetAvroWriters Hi Jan Could

AW: StreamingFileSink with ParquetAvroWriters

2021-01-15 Thread Jan Oelschlegel
:34 An: Jan Oelschlegel ; user@flink.apache.org Betreff: Re: StreamingFileSink with ParquetAvroWriters Hi Jan, Could you have a try by adding this dependency ? org.apache.parquet parquet-avro 1.11.1 Best, Yun --Original Mail -- Sender:Jan

AW: AW: StreamingFileSink with ParquetAvroWriters

2021-01-15 Thread Jan Oelschlegel
Yes, after unzipping it is in the jar: [cid:image002.jpg@01D6EB3D.33BCC530] Von: Dawid Wysakowicz Gesendet: Freitag, 15. Januar 2021 12:10 An: Jan Oelschlegel ; user@flink.apache.org Betreff: Re: AW: StreamingFileSink with ParquetAvroWriters Have you checked if the class (org/apache/parquet

AW: AW: StreamingFileSink with ParquetAvroWriters

2021-01-15 Thread Jan Oelschlegel
Sorry, wrong build. It is not in the jar. Von: Jan Oelschlegel Gesendet: Freitag, 15. Januar 2021 12:52 An: Dawid Wysakowicz ; user@flink.apache.org Betreff: AW: AW: StreamingFileSink with ParquetAvroWriters Yes, after unzipping it is in the jar: [cid:image001.jpg@01D6EB3F.136D8560] Von

DataStream API: Best way for reading csv file

2021-01-22 Thread Jan Oelschlegel
Hi , i'm looking for an comfortable way to read a CSV file with the DataStream API in Flink 1.11. Without using the Table/SQL-API before. This is my first approach: val typeInfo = TypeInformation.of(classOf[CovidEvent]).asInstanceOf[PojoTypeInfo[CovidEvent]] val csvInputFormat = new PojoCsv

AW: What is the best way to have a cache of an external database in Flink?

2021-01-22 Thread Jan Oelschlegel
But then you need a way to consume a database as a DataStream. I found this one https://github.com/ververica/flink-cdc-connectors. I want to implement a similar use case, but I don’t know how to parse the SourceRecord (which comes from the connector) into an PoJo for further processing. Best,

AbstractMethodError while writing to parquet

2021-02-02 Thread Jan Oelschlegel
Hi at all, i'm using Flink 1.11 with the datastream api. I would like to write my results in parquet format into HDFS. Therefore i generated an Avro SpecificRecord with the avro-maven-plugin: org.apache.avro avro-maven-plugin 1.8.2

AW: AbstractMethodError while writing to parquet

2021-02-03 Thread Jan Oelschlegel
to lookup such conflicts manually and then choose the same version like at flink dependencies ? Best, Jan Von: Till Rohrmann Gesendet: Mittwoch, 3. Februar 2021 11:41 An: Jan Oelschlegel Cc: user Betreff: Re: AbstractMethodError while writing to parquet Hi Jan, it looks to me that you might

AW: AbstractMethodError while writing to parquet

2021-02-04 Thread Jan Oelschlegel
Rohrmann Gesendet: Donnerstag, 4. Februar 2021 10:08 An: Jan Oelschlegel Cc: user Betreff: Re: AbstractMethodError while writing to parquet I guess it depends from where the other dependency is coming. If you have multiple dependencies which conflict then you have to resolve it. One way to detect

Kafka SQL Connector: dropping events if more partitions then source tasks

2021-02-17 Thread Jan Oelschlegel
Hi, i have a question regarding FlinkSQL connector for Kafka. I have 3 Kafka partitions and 1 Kafka SQL source connector (Parallelism 1). The data within the Kafka parttitons are sorted based on a event-time field, which is also my event-time in Flink. My Watermark is generated with a delay of

AW: Kafka SQL Connector: dropping events if more partitions then source tasks

2021-02-18 Thread Jan Oelschlegel
By using the DataStream API with the same business logic I'm getting no dropped events. Von: Jan Oelschlegel Gesendet: Mittwoch, 17. Februar 2021 19:18 An: user Betreff: Kafka SQL Connector: dropping events if more partitions then source tasks Hi, i have a question regarding Fli

AW: Kafka SQL Connector: dropping events if more partitions then source tasks

2021-02-19 Thread Jan Oelschlegel
If i increase the watermark, the dropped events getting lower. But why is the DataStream API Job still running with 12 hours watermark delay? By the way, I'm using Flink 1.11. It would be nice if someone could give me some advice. Best, Jan Von: Jan Oelschlegel Gesendet: Donnersta

AW: Kafka SQL Connector: dropping events if more partitions then source tasks

2021-02-24 Thread Jan Oelschlegel
://issues.apache.org/jira/browse/FLINK-20041 Best, Jan Von: Arvid Heise Gesendet: Mittwoch, 24. Februar 2021 14:10 An: Jan Oelschlegel Cc: user ; Timo Walther Betreff: Re: Kafka SQL Connector: dropping events if more partitions then source tasks Hi Jan, Are you running on historic data? Then

AW: Kafka SQL Connector: dropping events if more partitions then source tasks

2021-02-25 Thread Jan Oelschlegel
. Februar 2021 00:04 An: Jan Oelschlegel Cc: Arvid Heise ; user ; Timo Walther Betreff: Re: Kafka SQL Connector: dropping events if more partitions then source tasks Hi Jan, What you are observing is correct for the current implementation. Current watermark generation is based on subtask

AW: Kafka SQL Connector: dropping events if more partitions then source tasks

2021-02-26 Thread Jan Oelschlegel
case I have observed how with a larger number of source tasks no results are produced. Best, Jan Von: Shengkai Fang Gesendet: Freitag, 26. Februar 2021 15:32 An: Jan Oelschlegel Cc: Benchao Li ; Arvid Heise ; user ; Timo Walther Betreff: Re: Kafka SQL Connector: dropping events if more

AW: Kafka SQL Connector: dropping events if more partitions then source tasks

2021-03-01 Thread Jan Oelschlegel
/browse/FLINK-20041 Best, Jan Von: Shengkai Fang Gesendet: Samstag, 27. Februar 2021 05:03 An: Jan Oelschlegel Cc: Benchao Li ; Arvid Heise ; user ; Timo Walther Betreff: Re: Kafka SQL Connector: dropping events if more partitions then source tasks Hi Jan. Thanks for your reply. Do you set the

State Schema Evolution within SQL API

2021-03-01 Thread Jan Oelschlegel
Hi at all, i would like to know how far a state schema evolution is possible by using SQL API of Flink. Which query changes can I do without disrupting the schema of my savepoint? In the documentation is, only for the DataStream API , written what are the do's and don'ts regarding a safe sch

AW: Flink upgrade causes operator to lose state

2021-03-04 Thread Jan Oelschlegel
Maybe this strategy with evolving an application without the need for state restore could help you https://docs.ververica.com/v2.3/user_guide/sql_development/sql_scripts.html#sql-script-changes -Ursprüngliche Nachricht- Von: Chesnay Schepler Gesendet: Mittwoch, 3. März 2021 21:35 An: so

AW: State Schema Evolution within SQL API

2021-03-04 Thread Jan Oelschlegel
Wysakowicz Cc: Jan Oelschlegel ; user Betreff: Re: State Schema Evolution within SQL API Hello Dawid I'm interested in this discussion because I'm currently trying to upgrade flink from 1.9 to 1.12 for a bunch of sql jobs running in production. From what you said, this seems to

AW: State Schema Evolution within SQL API

2021-03-04 Thread Jan Oelschlegel
I think simply changing the parallelism or upgrading the underlying Flink version in a cluster should be no problem. But you want to change your queries, right ? Best, Jan Von: Jan Oelschlegel Gesendet: Donnerstag, 4. März 2021 12:26 An: XU Qinghui ; Dawid Wysakowicz Cc: user Betreff: AW

Flink Non-Heap Memory Configuration

2021-03-10 Thread Jan Oelschlegel
Hi, i'm using Flink 1.11.3. With Prometheus and Grafana I get some metrics from my standalone cluster. For this simple calculation there will be very less RAM (lee than 10 MB): flink_jobmanager_Status_JVM_Memory_NonHeap_Committed - flink_jobmanager_Status_JVM_Memory_NonHeap_Used Is that the r

Jobmanager time out / long running batch job

2021-03-11 Thread Jan Oelschlegel
Hi, Im using Flink 1.11.3 and run a batch job. In the log of the jobmanager I see that all operators switched from running to finished. And then there is a timeout of the jobmanager. And after some pause the overall status is switched from running to finished. Why is there a big gap in betwee