Runtime issue while using statefun-datastream v3.3.0

2024-06-19 Thread RAN JIANG
Hi all, We are trying to leverage statefun datastream features. After adding *org.apache.flink:statefun-flink-datastream:3.3.0* in our gradle file, we are experiencing a runtime error like this, *Caused by: java.lang.NoSuchMethodError: ‘com.google.protobuf.Descriptors$FileDescriptor

Re: A way to meter number of deserialization errors

2024-06-19 Thread Ilya Karpov
Does anybody experience the problem of metering deserialization errors? пн, 17 июн. 2024 г. в 14:39, Ilya Karpov : > Hi all, > we are planning to use flink as a connector between kafka and > external systems. We use protobuf as a message format in kafka. If > non-backward compatible changes

Checkpoints and windows size

2024-06-18 Thread banu priya
Hi All, I have a flink job with key by, tumbling window(2sec window time processing time)and aggregator. How often should I run the check point??I don't need the data to be retained after 2s. I want to use incremental check point with rocksdb. Thanks Banupriya

Re: [ANNOUNCE] Apache Flink CDC 3.1.1 released

2024-06-18 Thread Paul Lam
Well done! Thanks a lot for your hard work! Best, Paul Lam > 2024年6月19日 09:47,Leonard Xu 写道: > > Congratulations! Thanks Qingsheng for the release work and all contributors > involved. > > Best, > Leonard > >> 2024年6月18日 下午11:50,Qingsheng Ren 写道: >> >> The Apache Flink community is very

Re: [ANNOUNCE] Apache Flink CDC 3.1.1 released

2024-06-18 Thread Paul Lam
Well done! Thanks a lot for your hard work! Best, Paul Lam > 2024年6月19日 09:47,Leonard Xu 写道: > > Congratulations! Thanks Qingsheng for the release work and all contributors > involved. > > Best, > Leonard > >> 2024年6月18日 下午11:50,Qingsheng Ren 写道: >> >> The Apache Flink community is very

Re: [ANNOUNCE] Apache Flink CDC 3.1.1 released

2024-06-18 Thread Leonard Xu
Congratulations! Thanks Qingsheng for the release work and all contributors involved. Best, Leonard > 2024年6月18日 下午11:50,Qingsheng Ren 写道: > > The Apache Flink community is very happy to announce the release of Apache > Flink CDC 3.1.1. > > Apache Flink CDC is a distributed data integration

Re: [ANNOUNCE] Apache Flink CDC 3.1.1 released

2024-06-18 Thread Leonard Xu
Congratulations! Thanks Qingsheng for the release work and all contributors involved. Best, Leonard > 2024年6月18日 下午11:50,Qingsheng Ren 写道: > > The Apache Flink community is very happy to announce the release of Apache > Flink CDC 3.1.1. > > Apache Flink CDC is a distributed data integration

[ANNOUNCE] Apache Flink CDC 3.1.1 released

2024-06-18 Thread Qingsheng Ren
The Apache Flink community is very happy to announce the release of Apache Flink CDC 3.1.1. Apache Flink CDC is a distributed data integration tool for real time data and batch data, bringing the simplicity and elegance of data integration via YAML to describe the data movement and transformation

[ANNOUNCE] Apache Flink CDC 3.1.1 released

2024-06-18 Thread Qingsheng Ren
The Apache Flink community is very happy to announce the release of Apache Flink CDC 3.1.1. Apache Flink CDC is a distributed data integration tool for real time data and batch data, bringing the simplicity and elegance of data integration via YAML to describe the data movement and transformation

Elasticsearch 8 - FLINK-26088

2024-06-18 Thread Tauseef Janvekar
Dear Team, As per https://issues.apache.org/jira/browse/FLINK-26088, elasticsearch 8 support is already added but I do not see it in any documentation. Also the last version that supports any elasticsearch is 1.17.x. Can I get the steps on how to integrate with elastic 8 and some sample code

[no subject]

2024-06-18 Thread Dat Nguyen Tien

[no subject]

2024-06-18 Thread Dat Nguyen Tien

RE: Problem reading a CSV file with pyflink datastream in k8s with Flink operator

2024-06-18 Thread gwenael . lebarzic
Hello Rob. This workaround works indeed ! Cdt. [Logo Orange] Gwenael Le Barzic De : Robert Young Envoyé : mardi 18 juin 2024 03:54 À : LE BARZIC Gwenael DTSI/SI Cc : user@flink.apache.org Objet : Re: Problem reading a CSV file with pyflink datastream in k8s with

Help required to fix security vulnerabilities in Flink Docker image

2024-06-18 Thread elakiya udhayanan
Hi Community, In one of our applications we are using a Fink Docker image and running Flink as a Kubernetes pod. As per policy, we tried scanning the Docker image for security vulnerabilities using JFrog XRay and we find that there are multiple critical vulnerabilities being reported as seen in

Re: Flink Stateful Functions 3.4

2024-06-17 Thread Zhanghao Chen
There's no active maintenance of the StateFun project since the release of v3.3 in last September by GitHub commit history [1]. So currently, there's no estimate on when v3.4 could be released and which Flink version would be supported. [1]

Re: Problem reading a CSV file with pyflink datastream in k8s with Flink operator

2024-06-17 Thread Robert Young
Hi Gwenael, >From the logs I thought it was a JVM module opens/exports issue, but I found it had a similar issue using a java8 base image too. I think the issue is it's not permitted for PythonCsvUtils to call the package-private constructor of CsvReaderFormat across class loaders. One

Re: Problems with multiple sinks using postgres-cdc connector

2024-06-17 Thread Hongshun Wang
Hi David, In your modified pipeline, just one source from table1 is sufficient, with both sink1 and process2 sharing a single source from process1. However, based on your log, it appears that two sources have been generated. Do you have the execution graph available in the Flink UI? Best,

Flink Stateful Functions 3.4

2024-06-17 Thread L. Jiang
Hi there, Anyone knows which Flink version that Flink Stateful Functions 3.4 is compatible with? https://nightlies.apache.org/flink/flink-statefun-docs-master/docs/deployment/state-bootstrap/ I know Stateful Functions 3.3 is compatible with Flink 1.16.2, and Stateful Functions 3.2 is good with

Re: Problems with multiple sinks using postgres-cdc connector

2024-06-17 Thread David Bryson
These sink share the same source. The pipeline that works looks something like this: table1 -> process1 -> process2 -> sink2 When I change it to this: table1 -> process1 -> process2 -> sink2 `--> sink1 I get the errors described, where it appears that a second

A way to meter number of deserialization errors

2024-06-17 Thread Ilya Karpov
Hi all, we are planning to use flink as a connector between kafka and external systems. We use protobuf as a message format in kafka. If non-backward compatible changes occur we want to skip those messages ('protobuf.ignore-parse-errors' = 'true') but record an error and raise an alert. I didn't

RE: Problem reading a CSV file with pyflink datastream in k8s with Flink operator

2024-06-17 Thread gwenael . lebarzic
Hello everyone. Does someone know how to solve this please ? Cdt. [Logo Orange] Gwenael Le Barzic Ingénieur technique techno BigData Orange/OF/DTSI/SI/DATA-IA/SOLID/CXP Mobile : +33 6 48 70 85 75

Re: Problems with multiple sinks using postgres-cdc connector

2024-06-17 Thread Hongshun Wang
Hi David, > When I add this second sink, the postgres-cdc connector appears to add a second reader from the replication log, but with the same slot name. I don't understand what you mean by adding a second sink. Do they share the same source, or does each have a separate pipeline? If the former

flink checkpoint 延迟的性能问题讨论

2024-06-16 Thread 15868861416
各位大佬, 背景: 实际测试flink读Kafka 数据写入hudi, checkpoint的间隔时间是1min, state.backend分别为filesystem,测试结果如下: 写hudi的checkpoint 的延迟 写iceberg得延迟: 疑问: hudi的checkpoint的文件数据比iceberg要大很多,如何降低flink写hudi的checkpoint的延迟? | | 博星 | | 15868861...@163.com |

??????Flink????????????????join??????????????n??????

2024-06-16 Thread ????
1.left join

??????Flink????????????????join??????????????n??????

2024-06-16 Thread ????
??flink sql apidatastream api?? | ?? | <1227581...@qq.com.INVALID> | | | 2024??06??16?? 20:35 | | ?? | user-zh | | ?? | | | | Flinkjoin??n?? | ??

Flink????????????????join??????????????n??????

2024-06-16 Thread ????
?? 1DWD??KafkaDWD 2Kafka

Re: Exception: Coordinator of operator xxxx does not exist or the job vertex this operator belongs to is not initialized.

2024-06-16 Thread Junrui Lee
Hi Biao, I agree with you that this exception is not very meaningful and can be noisy in the JM logs, especially when running large-scale batch jobs in a session cluster. IIRC, there isn't a current config to filter out or silence such exceptions in batch mode. So I've created a JIRA ticket (

raw issues

2024-06-16 Thread Fokou Toukam, Thierry
hello, i am trying to do vector assembling with flink 1.15 but i have this. How can i solve it please? 2024-06-16 03:47:24 DEBUG Main:114 - Assembled Data Table Schema: root |-- tripId: INT |-- stopId: INT |-- routeId: INT |-- stopSequence: INT |-- speed: DOUBLE |-- currentStatus: INT

Re: Exception: Coordinator of operator xxxx does not exist or the job vertex this operator belongs to is not initialized.

2024-06-15 Thread Geng Biao
Hi Junrui, Thanks for your answer! Since this exception is not very meaningful, is there a solution or a flink config to filter out or silent such exception in batch mode? When I run some large scale batch jobs in a session cluster, it turns out that the JM log will be fulfilled with this

Re: Exception: Coordinator of operator xxxx does not exist or the job vertex this operator belongs to is not initialized.

2024-06-15 Thread Junrui Lee
Hi, This exception is common in batch jobs and is caused by the collect sink attempting to fetch data from the corresponding operator coordinator on the JM based on the operator ID. However, due to the sequential scheduling of batch jobs, if a job vertex has not been initialized yet, the

Exception: Coordinator of operator xxxx does not exist or the job vertex this operator belongs to is not initialized.

2024-06-15 Thread Corin
When I run a batch job using Flink 1.19, I used collect() in the job, and many times the following error appears in the JobManager log: Caused by: org.apache.flink.util.FlinkException: Coordinator of operator does not exist or the job vertex this operator belongs to is not initialized.

Problem reading a CSV file with pyflink datastream in k8s with Flink operator

2024-06-14 Thread gwenael . lebarzic
Hello everyone. I get the following error when trying to read a CSV file with pyflink datastream in a k8s environment using the flink operator. ### File "/opt/myworkdir/myscript.py", line 30, in run_flink_job(myfile) File "/opt/myworkdir/myscript.py", line 21, in run_flink_job

Re: Which base image to use for pyflink on k8s with flink operator ?

2024-06-14 Thread Gunnar Morling
Hey, I ran into some issues with PyFlink on Kubernetes myself a while ago. Blogged about it here, perhaps it's useful: https://www.decodable.co/blog/getting-started-with-pyflink-on-kubernetes Best, --Gunnar Am Fr., 14. Juni 2024 um 20:58 Uhr schrieb Mate Czagany : > Hi, > > You can refer

RE: Which base image to use for pyflink on k8s with flink operator ?

2024-06-14 Thread gwenael . lebarzic
To be sure about that, I can see this in the doc : # install PyFlink COPY apache-flink*.tar.gz / RUN pip3 install /apache-flink-libraries*.tar.gz && pip3 install /apache-flink*.tar.gz Is the result the same than this command below : RUN pip install --no-cache-dir -r requirements.txt With

RE: Which base image to use for pyflink on k8s with flink operator ?

2024-06-14 Thread gwenael . lebarzic
Thank you for your answer, Mate o/ Cdt. [Logo Orange] Gwenael Le Barzic Ingénieur technique techno BigData Orange/OF/DTSI/SI/DATA-IA/SOLID/CXP Mobile : +33 6 48 70 85 75

Re: Which base image to use for pyflink on k8s with flink operator ?

2024-06-14 Thread Mate Czagany
Oops, forgot the links, sorry about that [1] https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/standalone/docker/#using-flink-python-on-docker [2] https://github.com/apache/flink-kubernetes-operator/blob/main/examples/flink-python-example/Dockerfile Mate

Problems with multiple sinks using postgres-cdc connector

2024-06-14 Thread David Bryson
Hi, I have a stream reading from postgres-cdc connector version 3.1.0. I read from two tables: flink.cleaned_migrations public.cleaned I convert the tables into a datastream, do some processing, then write it to a sink at the end of my stream: joined_table_result =

Re: Which base image to use for pyflink on k8s with flink operator ?

2024-06-14 Thread Mate Czagany
Hi, You can refer to the example Dockerfile in the Flink docs [1] and you can also take a look at the example found in the Flink Kubernetes Operator repo [2]. The second Dockerfile won't work because it is missing all Flink libraries if I am not mistaken. Regards, Mate ezt írta (időpont: 2024.

Which base image to use for pyflink on k8s with flink operator ?

2024-06-14 Thread gwenael . lebarzic
Hello everyone. I contact you because I'm encountereing some strange difficulties with pyflink on Kubernetes using the flink operator. So, first thing first, I was wondering which base image should I use for my python image that I will then deploy on my Kubernetes cluster ? Can I use flink

Re:使用hbase连接器插入数据,一个列族下有多列时如何只更新其中一列

2024-06-14 Thread zboyu0104
怎么退订 from 阿里邮箱 iPhone-- 发件人:谢县东 日 期:2024年06月06日 16:07:05 收件人: 主 题:使用hbase连接器插入数据,一个列族下有多列时如何只更新其中一列 各位好: flink版本: 1.13.6 我在使用 flink-connector-hbase 连接器,通过flinkSQL 将数据写入hbase,hbase 建表如下: CREATE TABLE hbase_test_db_test_table_xxd

Re: flink cdc 3.0 schema变更问题

2024-06-13 Thread Yanquan Lv
你好,DataStream 的方式需要设置 includeSchemaChanges(true) 参数,并且设置自定义的 deserializer,参考这个链接[1]。 如果不想使用 json 的方式,希望自定义 deserializer,从 SourceRecord 里提取 ddl 的方式可以参考这个链接[2]提供的方案。 [1]

Re: flink cdc 3.0 schema变更问题

2024-06-12 Thread Xiqian YU
Zapjone 好, 目前的 Schema Evolution 的实现依赖传递 CDC Event 事件的 Pipeline 连接器和框架。如果您希望插入自定义算子逻辑,建议参考 flink-cdc-composer 模块中的 FlinkPipelineComposer 类构建算子链作业的方式,并在其中插入自定义的 Operator 以实现您的业务逻辑。 另外,对于一些简单的处理逻辑,如果能够使用 YAML 作业的 Route(路由)、Transform(变换)功能表述的话,直接编写对应的 YAML 规则会更简单。 祝好! Regards, yux De : zapjone

flink cdc 3.0 schema变更问题

2024-06-12 Thread zapjone
大佬们好: 想请假下,在flink cdc3.0中支持schema变更,但看到是pipeline方式的,因业务问题需要使用datastream进行特殊处理,所以想请假下,在flink cdc 3.0中datastream api中怎么使用schema变更呢?或者相关文档呢?

Re:changelogstream删除问题

2024-06-12 Thread Xuyang
Hi, 你可以试一下用statement set[1],将这个query同时写入到print sink中吗? 在tm日志里可以查看到print sink的结果,看看里面有没有-D类型的数据。如果没有的话,证明是test_changelog源表可能就没有-D的数据;如果有的话,就需要后续进一步排查sink表在ds和sql上的行为差异。 [1] https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/insert/#insert-into-multiple-tables [2]

Re: Understanding flink-autoscaler behavior

2024-06-12 Thread Chetas Joshi
Got it. Thanks! On Wed, Jun 12, 2024 at 6:49 PM Zhanghao Chen wrote: > > Does this mean it won't trigger a checkpoint before scaling up or > scaling down? > > The in-place rescaling won't do that. > > > Do I need the autotuning to be turned on for exactly once processing? > > It suffices to

Re: Understanding flink-autoscaler behavior

2024-06-12 Thread Zhanghao Chen
> Does this mean it won't trigger a checkpoint before scaling up or scaling > down? The in-place rescaling won't do that. > Do I need the autotuning to be turned on for exactly once processing? It suffices to just go back to the full-restart upgrade mode provided by the operator: disable

Re: Understanding flink-autoscaler behavior

2024-06-12 Thread Chetas Joshi
Hi Zhanghao, If I am using the autoscaler (flink-k8s-operator) without enabling auto-tuning, the documentation says it triggers in-place scaling without going through a full job upgrade lifecycle. Does this mean it won't trigger a checkpoint before scaling up or scaling down? I am observing that

Re: Slack Community Invite?

2024-06-12 Thread Ufuk Celebi
FYI This was recently fixed on the website, too. > On 28. May 2024, at 05:05, Alexandre Lemaire wrote: > > Thank you! > >> On May 27, 2024, at 9:35 PM, Junrui Lee wrote: >> >> Hi Alexandre, >> >> You can try this link: >>

Re: Setting uid hash for non-legacy sinks

2024-06-12 Thread Salva Alcántara
{ "emoji": "♥️", "version": 1 }

Re: Setting uid hash for non-legacy sinks

2024-06-12 Thread Gabor Somogyi
Hey Salva, Good to hear your issue has been resolved one way or another! Thanks for confirming that this operator hash trick is working on V2 as well. G On Wed, Jun 12, 2024 at 5:20 AM Salva Alcántara wrote: > Hey Gabor, > > I didn't finally need to keep compatibility with existing

Re: Changing TTL config without breaking state

2024-06-12 Thread xiangyu feng
Hi Salva, Unfortunately, the state is currently incompatible between enabling TTL and disabling TTL. This issue is tracked by this jira( https://issues.apache.org/jira/browse/FLINK-32955) and not resolved yet. Usually, we need to find a way to reaccumulate the state after enabled the state ttl.

Changing TTL config without breaking state

2024-06-11 Thread Salva Alcántara
I have some jobs where I can configure the TTL duration for certain operator state. The problem I'm noticing is that when I make changes in the TTL configuration the new state descriptor becomes incompatible and I cannot restart my jobs from current savepoints. Is that expected? More precisely,

Re: Force to commit kafka offset when stop a job.

2024-06-11 Thread Lei Wang
I tried, seems the offset will not be committed when doing savepoint After submitting a flink job, I just use the following cmd to see the committed offset bin/kafka-consumer-groups.sh --bootstrap-server xxx:9092 --describe --group groupName TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG

[ANNOUNCE] Apache flink-connector-opensearch 2.0.0 released

2024-06-11 Thread Sergey Nuyanzin
The Apache Flink community is very happy to announce the release of Apache flink-connector-opensearch 2.0.0. Apache Flink® is an open-source stream processing framework for distributed, high-performing, always-available, and accurate data streaming applications. The release is available for

[ANNOUNCE] Apache flink-connector-opensearch 1.2.0 released

2024-06-11 Thread Sergey Nuyanzin
The Apache Flink community is very happy to announce the release of Apache flink-connector-opensearch 1.2.0 for Flink 1.18 and Flink 1.19. Apache Flink® is an open-source stream processing framework for distributed, high-performing, always-available, and accurate data streaming applications. The

Re: Failed to resume from HA when the checkpoint has been deleted.

2024-06-11 Thread Zhanghao Chen
There's no such option yet. However, it might not be a good idea to silently ignore the exception and restart from fresh state which violates the data integrity. Instead, the job should be marked as terminally failed in this case (maybe after a few retries) and just leave users or an external

Re: Failed to resume from HA when the checkpoint has been deleted.

2024-06-11 Thread Jean-Marc Paulin
Thanks for you reply, Yes, this is indeed an option. But I was more after a config option to handle that scenario. If the HA metadata points to a checkpoint that is obviously not present (err 404 in the S3 case) there is little value in retrying. The HA data are obviously worthless in that

Re: Failed to resume from HA when the checkpoint has been deleted.

2024-06-10 Thread Zhanghao Chen
Hi, In this case, you could cancel the job using the flink stop​ command, which will clean up Flink HA metadata, and resubmit the job. Best, Zhanghao Chen From: Jean-Marc Paulin Sent: Monday, June 10, 2024 18:53 To: user@flink.apache.org Subject: Failed to

changelogstream删除问题

2024-06-10 Thread zapjone
大佬们好: 我使用datastream api进行实时读取mysql数据时,通过tableEnv.fromChangelogStream将datastram转换成了变更表,在使用sql将变更表数据写入数据湖中。 但经过测试,insert、update都可以正常实现,但delete无法实现删除操作,使用sql进行测试时,可以实现删除操作。(因有些逻辑需要api操作,就没有使用sql方式实现)。 代码: StreamExecutionEnvironment env = ...; StreamTableEnvironment tableEnv = ...; MySqlSource

Failed to resume from HA when the checkpoint has been deleted.

2024-06-10 Thread Jean-Marc Paulin
Hi, We have a 1.19 Flink streaming job, with HA enabled (ZooKeeper), checkpoint/savepoint in S3. We had an outage and now the jobmanager keeps restarting. We think it because it read the job id to be restarted from ZooKeeper, but because we lost our S3 Storage as part of the outage it cannot

Re: Setting uid hash for non-legacy sinks

2024-06-10 Thread Gabor Somogyi
YW, ping me back whether it works because it's a nifty feature. G On Mon, Jun 10, 2024 at 9:26 AM Salva Alcántara wrote: > Thanks Gabor, I will give it a try! > > On Mon, Jun 10, 2024 at 12:01 AM Gabor Somogyi > wrote: > >> Now I see the intention and then you must have a V2 sink, right?

Re: Setting uid hash for non-legacy sinks

2024-06-10 Thread Salva Alcántara
Thanks Gabor, I will give it a try! On Mon, Jun 10, 2024 at 12:01 AM Gabor Somogyi wrote: > Now I see the intention and then you must have a V2 sink, right? Maybe you > look for the following: > > final String writerHash = "f6b178ce445dc3ffaa06bad27a51fead"; > final String committerHash =

Re: How to Use AsyncLookupFunction

2024-06-10 Thread Krzysztof Chmielewski
Hi there Clenens. This one might be a little bit tricky. You can take a look: at https://github.com/getindata/flink-http-connector/blob/main/src/main/java/com/getindata/connectors/http/internal/table/lookup/AsyncHttpTableLookupFunction.java We implemented our http connector using assuncLookup

How to Use AsyncLookupFunction

2024-06-09 Thread Clemens Valiente
hi, how do I implement AsyncLookupFunctions correctly? I implemented a AsyncLookupFunction, the eval function has the following signature: https://nightlies.apache.org/flink/flink-docs-release-1.17/api/java/org/apache/flink/table/functions/AsyncLookupFunction.html eval(CompletableFuture> future,

Re: Setting uid hash for non-legacy sinks

2024-06-09 Thread Gabor Somogyi
Now I see the intention and then you must have a V2 sink, right? Maybe you look for the following: final String writerHash = "f6b178ce445dc3ffaa06bad27a51fead"; final String committerHash = "68ac8ae79eae4e3135a54f9689c4aa10"; final CustomSinkOperatorUidHashes operatorsUidHashes =

Re: Setting uid hash for non-legacy sinks

2024-06-09 Thread Zhanghao Chen
Hi Salva, The SinkV2 transformation will be translated to multiple operators at the physical level. When setting a UID, Flink will automatically generate UID for sub-operators by filling the configured UID in a pre-defined naming template. The naming template is carefully maintained to ensure

Re: Setting uid hash for non-legacy sinks

2024-06-09 Thread Salva Alcántara
Hi Gabor, Yeah, I know this, but what if you initially forgot and now you want to add the uid "after the fact"? You need to get the operator/vertex id used by Flink for current savepoints and somehow set this id for the sink. With the uid method you would need to "hack" the existing hash (get a

Re: Understanding flink-autoscaler behavior

2024-06-07 Thread Zhanghao Chen
Hi, Reactive mode and the Autoscaler in Kubernetes operator are two different approaches towards elastic scaling of Flink. Reactive mode [1] has to be used together with the passive resource management approach of Flink (only Standalone mode takes this approach), where the TM number is

Re: Understanding flink-autoscaler behavior

2024-06-07 Thread Sachin Sharma
Hi, I have a question related to this. I am doing a POC with Kubernetes operator 1.8 and flink 1.18 version with Reactive mode enabled, I added some dummy slow and fast operator to the flink job and i can see there is a back pressure accumulated. but i am not sure why my Flink task managers are

[ANNOUNCE] Apache flink-connector-kafka 3.2.0 released

2024-06-07 Thread Danny Cranmer
The Apache Flink community is very happy to announce the release of Apache flink-connector-kafka 3.2.0 for Flink 1.18 and 1.19. Apache Flink® is an open-source stream processing framework for distributed, high-performing, always-available, and accurate data streaming applications. The release is

[ANNOUNCE] Apache flink-connector-jdbc 3.2.0 released

2024-06-07 Thread Danny Cranmer
The Apache Flink community is very happy to announce the release of Apache flink-connector-jdbc 3.2.0 for Flink 1.18 and 1.19. Apache Flink® is an open-source stream processing framework for distributed, high-performing, always-available, and accurate data streaming applications. The release is

Issue with presto recovering savepoint files in Flink 1.19

2024-06-07 Thread Nora
Hi all, We have just upgraded to Flink 1.19 and we are experiencing some issue when the job tries to restore from *some* savepoints, not all. In these cases, the failure manifests when the job is unable to create a checkpoint after starting from savepoint, saying *Failure reason: Not all required

Re: State leak in tumbling windows

2024-06-07 Thread Adam Domanski
Hi Yanfei, I'm using Flink SQL API, however isn't it like SQL is translated to DataStream's building blocks, so SQL window is in fact eg. SlicingWindowOperator? I see only such rocks dbs under a TM' /tmp/flink-io-* directory: bash-4.4$ du -d 1 . -b 1 94262

Re: [RESULT][VOTE] flink-connector-jdbc 3.2.0, release candidate #1

2024-06-07 Thread Danny Cranmer
Apologies, this was RC2, not RC1. On Fri, Jun 7, 2024 at 11:12 AM Danny Cranmer wrote: > I'm happy to announce that we have unanimously approved this release. > > There are 7 approving votes, 3 of which are binding: > * Ahmed Hamdy > * Hang Ruan > * Leonard Xu (binding) > * Yuepeng Pan > *

[RESULT][VOTE] flink-connector-jdbc 3.2.0, release candidate #1

2024-06-07 Thread Danny Cranmer
I'm happy to announce that we have unanimously approved this release. There are 7 approving votes, 3 of which are binding: * Ahmed Hamdy * Hang Ruan * Leonard Xu (binding) * Yuepeng Pan * Zhongqiang Gong * Rui Fan (binding) * Weijie Guo (binding) There was one -1 vote that was cancelled. *

[ANNOUNCE] Apache flink-connector-cassandra 3.2.0 released

2024-06-07 Thread Danny Cranmer
The Apache Flink community is very happy to announce the release of Apache flink-connector-cassandra 3.2.0 for Flink 1.18 and 1.19. Apache Flink® is an open-source stream processing framework for distributed, high-performing, always-available, and accurate data streaming applications. The

JobManager not reporting insufficient memory

2024-06-07 Thread HouseStark
Hi Everyone, We are using the TableStream API of Flink v1.14.3 with HA Kubernetes enabled, along with Flink K8s Operator v1.6. One of the jobs, which had been running stably for a long time, started restarting frequently. Upon closer inspection, we observed that the container memory usage

[ANNOUNCE] Apache flink-connector-aws 4.3.0 released

2024-06-07 Thread Danny Cranmer
The Apache Flink community is very happy to announce the release of Apache flink-connector-aws 4.3.0 for Flink 1.18 and 1.19. Apache Flink® is an open-source stream processing framework for distributed, high-performing, always-available, and accurate data streaming applications. The release is

[ANNOUNCE] Apache flink-connector-gcp-pubsub 3.1.0 released

2024-06-07 Thread Danny Cranmer
The Apache Flink community is very happy to announce the release of Apache flink-connector-gcp-pubsub 3.1.0 for Flink 1.18 and 1.19. Apache Flink® is an open-source stream processing framework for distributed, high-performing, always-available, and accurate data streaming applications. The

Re: Flatmap "blocking" or not

2024-06-07 Thread Xiqian YU
Hi Alexandre, According to this StackOverFlow conversation[1], seems both AsyncFunction and FlatMapFunction requires the output object could fit into memory, which seems not feasible in the case you mentioned. Maybe it could be done by creating a customized Flink Source with FLIP-27 new API?

Flatmap "blocking" or not

2024-06-07 Thread Alexandre KY
Hi, I am designing a Flink pipeline to process a stream of images (rasters to be more accurate which are quite heavy: up to dozen GB). To distribute the process of one image, we split it into tiles to which we apply the processing that don't require the whole image before reassembling it.

Re: Setting uid hash for non-legacy sinks

2024-06-07 Thread Gabor Somogyi
Hi Salva, Just wondering why not good to set the uid like this? ``` output.sinkTo(outputSink).uid("my-human-readable-sink-uid"); ``` >From the mentioned UID Flink is going to make the hash which is consistent from UID -> HASH transformation perspective. BR, G On Fri, Jun 7, 2024 at 7:54 AM

Re: Understanding flink-autoscaler behavior

2024-06-07 Thread Gyula Fóra
Hi! To simplify things you can generally look at TRUE_PROCESSING_RATE, SCALUE_UP_RATE_THRESHOLD and SCALE_DOWN_RATE_THRESHOLD. If TPR is below the scale up threshold then we should scale up and if its above the scale down threshold then we scale down. In your case what we see for your source

Setting uid hash for non-legacy sinks

2024-06-06 Thread Salva Alcántara
Hi, I want to add the uid for my Kafka sink in such a way that I can still use the existing savepoint. The problem I'm having is that I cannot set the uid hash. If I try something like this: ``` output.sinkTo(outputSink).setUidHash("xyzb"); ``` I get the following

raw error

2024-06-06 Thread Fokou Toukam, Thierry
Hi everyone, why does flink xonsider densevector or vectors as a raw types'features: RAW('org.apache.flink.ml.linalg.Vector', '...')'? I'm trying to deploy my job on flink and i have this error Server Response: org.apache.flink.runtime.rest.handler.RestHandlerException: Could not execute

RE: Does Application mode support multiple submissions in HA mode?

2024-06-06 Thread Junrui Lee
Currently, Application mode does not support multiple job submissions when HA is enabled. You can check the official documentation for this statement:https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/overview/#application-mode

Does Application mode support multiple submissions in HA mode?

2024-06-06 Thread Steven Chen
Does Application mode support multiple submissions in HA mode?

Re: State leak in tumbling windows

2024-06-06 Thread Yanfei Lei
Hi Adam, Is your job a datastream job or a sql job? After I looked through the window-related code(I'm not particularly familiar with this part of the code), this problem should only exist in datastream. Adam Domanski 于2024年6月3日周一 16:54写道: > > Dear Flink users, > > I spotted the ever growing

Re:Memory table in pyflink

2024-06-06 Thread Xuyang
Hi, Currently, Flink does not have a concept similar to an in-memory table that allows you to temporarily store some data, because Flink itself does not store data and is a computing engine. May I ask what the purpose of using a temporary table is? Would it be possible to use a

Understanding flink-autoscaler behavior

2024-06-06 Thread Chetas Joshi
Hi Community, I want to understand the following logs from the flink-k8s-operator autoscaler. My flink pipeline running on 1.18.0 and using flink-k8s-operator (1.8.0) is not scaling up even though the source vertex is back-pressured. 2024-06-06 21:33:35,270 o.a.f.a.ScalingMetricCollector

Information Request

2024-06-06 Thread Fokou Toukam, Thierry
I want to set up a stream processing environment on a ubuntu server for machine learning. Which version of Apache Flink do you recommend me if i want to use maven? Thierry FOKOU | IT M.A.Sc Student Département de génie logiciel et TI École de technologie supérieure | Université du Québec

Memory table in pyflink

2024-06-06 Thread Phil Stavridis
Hello, I am trying to create an in-memory table in PyFlink to use as a staging table after ingesting some data from Kafka but it doesn’t work as expected. I have used the print connector which prints the results but I need to use a similar connector that stores staging results. I have tried

Re: Force to commit kafka offset when stop a job.

2024-06-06 Thread Zhanghao Chen
Yes, the exact offset position will also be committed when doing the savepoint. Best, Zhanghao Chen From: Lei Wang Sent: Thursday, June 6, 2024 16:54 To: Zhanghao Chen ; ruanhang1...@gmail.com Cc: user Subject: Re: Force to commit kafka offset when stop a job.

[ANNOUNCE] Apache flink-connector-mongodb 1.2.0 released

2024-06-06 Thread Danny Cranmer
The Apache Flink community is very happy to announce the release of Apache flink-connector-mongodb 1.2.0 for Flink 1.18 and 1.19. Apache Flink® is an open-source stream processing framework for distributed, high-performing, always-available, and accurate data streaming applications. The release

Re: Force to commit kafka offset when stop a job.

2024-06-06 Thread Lei Wang
Thanks Zhanghao && Hang. I am familiar with the flink savepoint feature. The exact offset position is stored in savepoint and the job can be resumed from the savepoint using the offset position that is stored in it. But I am not sure whether the exact offset position is committed to kafka when

使用hbase连接器插入数据,一个列族下有多列时如何只更新其中一列

2024-06-06 Thread 谢县东
各位好: flink版本: 1.13.6 我在使用 flink-connector-hbase 连接器,通过flinkSQL 将数据写入hbase,hbase 建表如下: CREATE TABLE hbase_test_db_test_table_xxd ( rowkey STRING, cf1 ROW, PRIMARY KEY (rowkey) NOT ENFORCED ) WITH ( 'connector' = 'hbase-2.2', 'table-name' =

Re: Force to commit kafka offset when stop a job.

2024-06-05 Thread Zhanghao Chen
Hi, you could stop the job with a final savepoint [1]. Flink which will trigger a final offset commit on the final savepoint. [1] https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/cli/#stopping-a-job-gracefully-creating-a-final-savepoint Best, Zhanghao Chen

Re: Force to commit kafka offset when stop a job.

2024-06-05 Thread Hang Ruan
Hi Lei. I think you could try to use `stop with savepoint` to stop the job. The offset will be committed when the checkpoint finished. So I think `stop with savepoint` may be helpful. Best, Hang Lei Wang 于2024年6月6日周四 01:16写道: > > When stopping a flink job that consuming kafka message, how to

Re: Flink job Deployement problem

2024-06-05 Thread Hang Ruan
Hi, Fokou Toukam. This error occurs when the schema in the sink mismatches the schema you provided from the upstream. You may need to check whether the provided type of field `features` in sink is the same as the type in the provided upstream. Best, Hang Fokou Toukam, Thierry 于2024年6月6日周四

Re: Flink job Deployement problem

2024-06-05 Thread Xiqian YU
Hi Fokou, Seems `features` column was inferenced to be RAW type, which doesn’t carry any specific data information, and causes following type casting failed. Sometimes it will happen when Flink can’t infer return type from a lambda expression but no explicit returning type information was

<    1   2   3   4   5   6   7   8   9   10   >