Re: "Self-service ingestion pipelines with evolving schema via Flink and Iceberg" presentation recording from Flink Forward Seattle 2023

2024-05-24 Thread Andrew Otto
> I’m curious if there is any reason for choosing Iceberg instead of Paimon No technical reason that I'm aware of. We are using it mostly because of momentum. We looked at Flink Table Store (before it was Paimon), but decided it was too early and the docs were too sparse at the time to really

Re: "Self-service ingestion pipelines with evolving schema via Flink and Iceberg" presentation recording from Flink Forward Seattle 2023

2024-05-24 Thread Giannis Polyzos
I’m curious if there is any reason for choosing Iceberg instead of Paimon (other than - iceberg is more popular). Especially for a use case like CDC that iceberg struggles to support. On Fri, 24 May 2024 at 3:22 PM, Andrew Otto wrote: > Interesting thank you! > > I asked this in the Paimon

Re: "Self-service ingestion pipelines with evolving schema via Flink and Iceberg" presentation recording from Flink Forward Seattle 2023

2024-05-24 Thread Andrew Otto
Interesting thank you! I asked this in the Paimon users group: How coupled to Paimon catalogs and tables is the cdc part of Paimon? RichCdcMultiplexRecord

Ways to detect a scaling event within a flink operator at runtime

2024-05-23 Thread Chetas Joshi
Hello, On a k8s cluster, I have the flink-k8s-operator running 1.8 with autoscaler = enabled (in-place) and a flinkDeployment (application mode) running 1.18.1. The flinkDeployment i.e. the flink streaming application has a mock data producer as the source. The source generates data points

Re: "Self-service ingestion pipelines with evolving schema via Flink and Iceberg" presentation recording from Flink Forward Seattle 2023

2024-05-23 Thread Péter Váry
If I understand correctly, Paimon is sending `CdcRecord`-s [1] on the wire which contain not only the data, but the schema as well. With Iceberg we currently only send the row data, and expect to receive the schema on job start - this is more performant than sending the schema all the time, but

Pulsar connector resets existing subscription

2024-05-23 Thread Igor Basov
Hi everyone, I have a problem with how Flink deals with the existing subscription in a Pulsar topic. - Subscription has some accumulated backlog - Flink job is deployed from a clear state (no checkpoints) - Flink job uses the same subscription name as the existing one; the start

Re: "Self-service ingestion pipelines with evolving schema via Flink and Iceberg" presentation recording from Flink Forward Seattle 2023

2024-05-23 Thread Andrew Otto
Ah I see, so just auto-restarting to pick up new stuff. I'd love to understand how Paimon does this. They have a database sync action which will sync entire databases, handle schema evolution, and I'm

Re: "Self-service ingestion pipelines with evolving schema via Flink and Iceberg" presentation recording from Flink Forward Seattle 2023

2024-05-23 Thread Péter Váry
I will ask Marton about the slides. The solution was something like this in a nutshell: - Make sure that on job start the latest Iceberg schema is read from the Iceberg table - Throw a SuppressRestartsException when data arrives with the wrong schema - Use Flink Kubernetes Operator to restart

Re: issues with Flink kinesis connector

2024-05-23 Thread Nick Hecht
Thank you for your help! On Thu, May 23, 2024 at 1:40 PM Aleksandr Pilipenko wrote: > Hi Nick, > > You need to use another method to add sink to your job - sinkTo. > KinesisStreamsSink implements newer Sink interface, while addSink expect > old SinkFunction. You can see this by looking at

Re: issues with Flink kinesis connector

2024-05-23 Thread Aleksandr Pilipenko
Hi Nick, You need to use another method to add sink to your job - sinkTo. KinesisStreamsSink implements newer Sink interface, while addSink expect old SinkFunction. You can see this by looking at method signatures[1] and in usage examples in documentation[2] [1]

Re: Would Java 11 cause Getting OutOfMemoryError: Direct buffer memory?

2024-05-23 Thread Zhanghao Chen
Hi John, Based on the Memory config screenshot provided before, each of your TM should have MaxDirectMemory=1GB (network mem) + 128 MB (framework off-heap) = 1152 MB. Nor will taskmanager.memory.flink.size and the total including MaxDirectMemory exceed pod physical mem, you may check the

Re: Task Manager memory usage

2024-05-23 Thread Zhanghao Chen
Hi Sigalit, For states stored in memory, they would most probably keep alive for several rounds of GC and ended up in the old gen of heap, and won't get recycled until a Full GC. As for the TM pod memory usage, most probabliy it will stop increasing at some point. You could try setting a

Re: "Self-service ingestion pipelines with evolving schema via Flink and Iceberg" presentation recording from Flink Forward Seattle 2023

2024-05-23 Thread Andrew Otto
Wow, I would LOVE to see this talk. If there is no recording, perhaps there are slides somewhere? On Thu, May 23, 2024 at 11:00 AM Carlos Sanabria Miranda < sanabria.miranda.car...@gmail.com> wrote: > Hi everyone! > > I have found in the Flink Forward website the following presentation: >

issues with Flink kinesis connector

2024-05-23 Thread Nick Hecht
Hello, I am currently having issues trying to use the python flink 1.18 Datastream api with the Amazon Kinesis Data Streams Connector. >From the documentation https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/connectors/datastream/kinesis/ I have downloaded the

"Self-service ingestion pipelines with evolving schema via Flink and Iceberg" presentation recording from Flink Forward Seattle 2023

2024-05-23 Thread Carlos Sanabria Miranda
Hi everyone! I have found in the Flink Forward website the following presentation: "Self-service ingestion pipelines with evolving schema via Flink and Iceberg " by

Re: Would Java 11 cause Getting OutOfMemoryError: Direct buffer memory?

2024-05-23 Thread John Smith
Based on these two settings... taskmanager.memory.flink.size: 16384m taskmanager.memory.jvm-metaspace.size: 3072m Reading the docs here I'm not sure how to calculate the formula. My suspicion is that I may have allocated too much of taskmanager.memory.flink.size and the total including

kirankumarkathe...@gmail.com-unsubscribe

2024-05-23 Thread Kiran Kumar Kathe
Kindly un subscribe for this gmail account kirankumarkathe...@gmail.com

Re: Task Manager memory usage

2024-05-23 Thread Sigalit Eliazov
hi, thanks for your reply, we are storing the data in memory since it is a short term we thought that adding rocksdb will add overhead. On Thu, May 23, 2024 at 4:38 PM Sachin Mittal wrote: > Hi > Where are you storing the state. > Try rocksdb. > > Thanks > Sachin > > > On Thu, 23 May 2024 at

Re: Task Manager memory usage

2024-05-23 Thread Sachin Mittal
Hi Where are you storing the state. Try rocksdb. Thanks Sachin On Thu, 23 May 2024 at 6:19 PM, Sigalit Eliazov wrote: > Hi, > > I am trying to understand the following behavior in our Flink application > cluster. Any assistance would be appreciated. > > We are running a Flink application

Help with monitoring metrics of StateFun runtime with prometheus

2024-05-23 Thread Oliver Schmied
Dear Apache Flink community,   I am setting up an apche flink statefun runtime on Kubernetes, following the flink-playground example: https://github.com/apache/flink-statefun-playground/tree/main/deployments/k8s. This is the manifest I used for creating the statefun enviroment: ```---

Task Manager memory usage

2024-05-23 Thread Sigalit Eliazov
Hi, I am trying to understand the following behavior in our Flink application cluster. Any assistance would be appreciated. We are running a Flink application cluster with 5 task managers, each with the following configuration: - jobManagerMemory: 12g - taskManagerMemory: 20g -

Re: 关于 mongo db 的splitVector 权限问题

2024-05-23 Thread Jiabao Sun
Hi, splitVector 是 MongoDB 计算分片的内部命令,在副本集部署模式下也可以使用此命令来计算 chunk 区间。 如果没有 splitVector 权限,会自动降级为 sample 切分策略。 Best, Jiabao evio12...@gmail.com 于2024年5月23日周四 16:57写道: > > hello~ > > > 我正在使用 flink-cdc mongodb connector 2.3.0 >

关于 mongo db 的splitVector 权限问题

2024-05-23 Thread evio12...@gmail.com
hello~ 我正在使用 flink-cdc mongodb connector 2.3.0 (https://github.com/apache/flink-cdc/blob/release-2.3/docs/content/connectors/mongodb-cdc.md) , 文档中指出 mongo 账号需要这些权限 'splitVector', 'listDatabases', 'listCollections', 'collStats', 'find', and 'changeStream' , 我现在使用的mongo是 replica-set , 但是了解到

Re: Flink kinesis connector 4.3.0 release estimated date

2024-05-23 Thread Vararu, Vadim
That’s great news. Thanks. From: Leonard Xu Date: Thursday, 23 May 2024 at 04:42 To: Vararu, Vadim Cc: user , Danny Cranmer Subject: Re: Flink kinesis connector 4.3.0 release estimated date Hey, Vararu The kinesis connector 4.3.0 release is under vote phase and we hope to finalize the

Re: Flink kinesis connector 4.3.0 release estimated date

2024-05-22 Thread Leonard Xu
Hey, Vararu The kinesis connector 4.3.0 release is under vote phase and we hope to finalize the release work in this week if everything goes well. Best, Leonard > 2024年5月22日 下午11:51,Vararu, Vadim 写道: > > Hi guys, > > Any idea when the 4.3.0 kinesis connector is estimated to be released?

Flink Kubernetes Operator Pod Disruption Budget

2024-05-22 Thread Jeremy Alvis via user
Hello, In order to maintain at least one pod for both the Flink Kubernetes Operator and JobManagers through Kubernetes processes that use the Eviction API such as when draining a node, we have deployed Pod

Flink kinesis connector 4.3.0 release estimated date

2024-05-22 Thread Vararu, Vadim
Hi guys, Any idea when the 4.3.0 kinesis connector is estimated to be released? Cheers, Vadim.

[ANNOUNCE] Apache Celeborn 0.4.1 available

2024-05-22 Thread Nicholas Jiang
Hi all, Apache Celeborn community is glad to announce the new release of Apache Celeborn 0.4.1. Celeborn is dedicated to improving the efficiency and elasticity of different map-reduce engines and provides an elastic, high-efficient service for intermediate data including shuffle data, spilled

StateMigrationException while using stateTTL

2024-05-22 Thread irakli.keshel...@sony.com
Hello, I'm using Flink 1.17.1 and I have stateTTL enabled in one of my Flink jobs where I'm using the RocksDB for checkpointing. I have a value state of Pojo class (which is generated from Avro schema). I added a new field to my schema along with the default value to make sure it is backwards

Re: Get access to unmatching events in Apache Flink Cep

2024-05-22 Thread Anton Sidorov
In answer Biao said "currently there is no such API to access the middle NFA state". May be that API exist in plan? Or I can create issue or pull request that add API? пт, 17 мая 2024 г. в 12:04, Anton Sidorov : > Ok, thanks for the reply. > > пт, 17 мая 2024 г. в 09:22, Biao Geng : > >> Hi

IllegalStateException: invalid BLOB

2024-05-21 Thread Lars Skjærven
Hello, We're facing the bug reported in https://issues.apache.org/jira/browse/FLINK-32212 More specifically, when kubernetes decides to drain a node, a job manager restart (but not the task manager), the job fails with: java.lang.IllegalStateException: The library registration references a

Re:咨询Flink 1.19文档中关于iterate操作

2024-05-20 Thread Xuyang
Hi, 目前Iterate api在1.19版本上废弃了,不再支持,具体可以参考[1][2]。Flip[1]中提供了另一种替代的办法[3] [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-357%3A+Deprecate+Iteration+API+of+DataStream [2] https://issues.apache.org/jira/browse/FLINK-33144 [3]

Re: Flink autoscaler with AWS ASG: checkpoint access issue

2024-05-20 Thread Chetas Joshi
Hello, After digging into the 403 issue a bit, I figured out that after the scale-up event, the flink-s3-fs-presto uses the node-profile instead of IRSA (Iam Role for Service Account) on some of the newly created TM pods. 1. Anyone else experienced this as well? 2. Verified that this is an issue

咨询Flink 1.19文档中关于iterate操作

2024-05-20 Thread www
尊敬的Flink开发团队: 您好! 我目前正在学习如何使用Apache Flink的DataStream API实现迭代算法,例如图的单源最短路径。在Flink 1.18版本的文档中,我注意到有关于iterate操作的介绍,具体请见:https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/dev/datastream/overview/#iterations 但是,我发现Flink

Re: flinksql 经过优化后,group by字段少了

2024-05-20 Thread Lincoln Lee
Hi, 可以尝试下 1.17 或更新的版本, 这个问题在 flink 1.17.0 中已修复[1]。 批处理中做这个 remove 优化是符合语义的,而在流中不能直接裁剪, 对于相关时间函数的说明文档[2]中也进行了更新 [1] https://issues.apache.org/jira/browse/FLINK-30006 [2]

Re: What is the best way to aggregate data over a long window

2024-05-20 Thread gongzhongqiang
Hi Sachin, `performing incremental aggregation using stateful processing` is same as `windows with agg`, but former is more flexible.If flink window can not satisfy your performance needs ,and your business logic has some features that can be customized for optimization. You can choose the

Re: Email submission

2024-05-20 Thread Hang Ruan
Hi, Michas. Please subscribe to the mailing list by sending an email to user-subscr...@flink.apache.org . Best, Hang Michas Szacillo (BLOOMBERG/ 919 3RD A) 于2024年5月19日周日 04:34写道: > Sending my email to join the apache user mailing list. > > Email: mszaci...@bloomberg.net >

Re: Restore from checkpoint

2024-05-20 Thread archzi lu
Hi Phil, correction: But the error you have is a familiar error if you have written some code to handle directory path. --> But the error you have is a familiar error if you have written some code to handle directory path with Java. No offence. Best regards. Jiadong. Lu Jiadong Lu

Re: Restore from checkpoint

2024-05-20 Thread Jiadong Lu
Hi, Phil I don't have more expertise about the flink-python module. But the error you have is a familiar error if you have written some code to handle directory path. The correct form of Path/URI will be : 1. "/home/foo" 2. "file:///home/foo/boo" 3. "hdfs:///home/foo/boo" 4. or Win32

Re: flinksql 经过优化后,group by字段少了

2024-05-19 Thread Benchao Li
你引用的这个 calcite 的 issue[1] 是在 calcite-1.22.0 版本中就修复了的,Flink 应该从 1.11 版本开始就已经用的是这个 calcite 版本了。 所以你用的是哪个版本的 Flink 呢,感觉这个可能是另外一个问题。如果可以在当前最新的版本 1.19 中复现这个问题的话,可以建一个 issue 来报一个 bug。 PS: 上面我说的这个行为,我稍微确认下了,这个应该是一个代码生成阶段才做的区分,所以优化过程中并不能识别,所以即使是batch模式下,优化的plan也应该是包含dt字段的。 [1]

Re: Restore from checkpoint

2024-05-19 Thread Jinzhong Li
Hi Phil, I think you can use the "-s :checkpointMetaDataPath" arg to resume the job from a retained checkpoint[1]. [1] https://nightlies.apache.org/flink/flink-docs-master/docs/ops/state/checkpoints/#resuming-from-a-retained-checkpoint Best, Jinzhong Li On Mon, May 20, 2024 at 2:29 AM Phil

Re: flinksql 经过优化后,group by字段少了

2024-05-19 Thread Benchao Li
看起来像是因为 "dt = cast(CURRENT_DATE as string)" 推导 dt 这个字段是个常量,进而被优化掉了。 将 CURRENT_DATE 优化为常量的行为应该只在 batch 模式下才是这样的,你这个 SQL 是跑在 batch 模式下的嘛? ℡小新的蜡笔不见嘞、 <1515827...@qq.com.invalid> 于2024年5月19日周日 01:01写道: > > create view tmp_view as > SELECT > dt, -- 2 > uid, -- 0 > uname, -- 1 > uage

Re: Re: [ANNOUNCE] Apache Flink CDC 3.1.0 released

2024-05-19 Thread Jingsong Li
CC to the Paimon community. Best, Jingsong On Mon, May 20, 2024 at 9:55 AM Jingsong Li wrote: > > Amazing, congrats! > > Best, > Jingsong > > On Sat, May 18, 2024 at 3:10 PM 大卫415 <2446566...@qq.com.invalid> wrote: > > > > 退订 > > > > > > > > > > > > > > > > Original Email > > > > > > > >

Re: Re: [ANNOUNCE] Apache Flink CDC 3.1.0 released

2024-05-19 Thread Jingsong Li
CC to the Paimon community. Best, Jingsong On Mon, May 20, 2024 at 9:55 AM Jingsong Li wrote: > > Amazing, congrats! > > Best, > Jingsong > > On Sat, May 18, 2024 at 3:10 PM 大卫415 <2446566...@qq.com.invalid> wrote: > > > > 退订 > > > > > > > > > > > > > > > > Original Email > > > > > > > >

Re: Re: [ANNOUNCE] Apache Flink CDC 3.1.0 released

2024-05-19 Thread Jingsong Li
Amazing, congrats! Best, Jingsong On Sat, May 18, 2024 at 3:10 PM 大卫415 <2446566...@qq.com.invalid> wrote: > > 退订 > > > > > > > > Original Email > > > > Sender:"gongzhongqiang"< gongzhongqi...@apache.org ; > > Sent Time:2024/5/17 23:10 > > To:"Qingsheng Ren"< re...@apache.org ; > > Cc

Aw: Re: Advice Needed: Setting Up Prometheus and Grafana Monitoring for Apache Flink on Kubernetes

2024-05-19 Thread Oliver Schmied
Dear Biao Geng,   thank you very much. With the help of your demo and the YAML configuration, I was able to successfully set up monitoring for my Apache Flink jobs.   Thanks again for your time and help.   Best regards, Oliver     Gesendet: Sonntag, 19. Mai 2024 um 17:42 Uhr Von: "Biao

Re: Restore from checkpoint

2024-05-19 Thread Phil Stavridis
Hi Lu, Thanks for your reply. In what way are the paths to get passed to the job that needs to used the checkpoint? Is the standard way, using -s :/ or by passing the path in the module as a Python arg? Kind regards Phil > On 18 May 2024, at 03:19, jiadong.lu wrote: > > Hi Phil, > > AFAIK,

Re: Advice Needed: Setting Up Prometheus and Grafana Monitoring for Apache Flink on Kubernetes

2024-05-19 Thread Biao Geng
Hi Oliver, I believe you are almost there. One thing I found could improve is that in your job yaml, instead of using: kubernetes.operator.metrics.reporter.prommetrics.reporters: prom kubernetes.operator.metrics.reporter.prommetrics.reporter.prom.factory.class:

Advice Needed: Setting Up Prometheus and Grafana Monitoring for Apache Flink on Kubernetes

2024-05-18 Thread Oliver Schmied
Dear Apache Flink Community, I am currently trying to monitor an Apache Flink cluster deployed on Kubernetes using Prometheus and Grafana. Despite following the official guide (https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-release-1.8/docs/operations/metrics-logging/)  on how

Email submission

2024-05-18 Thread Michas Szacillo (BLOOMBERG/ 919 3RD A)
Sending my email to join the apache user mailing list. Email: mszaci...@bloomberg.net

Re: problem with the heartbeat interval feature

2024-05-18 Thread Hongshun Wang
Hi Thomas, I have reviewed the code and just noticed that heartbeat.action.query is not mandatory. Debezium will generate Heartbeat Events at regular intervals. Flink CDC will then receive these Heartbeat Events and advance the offset[1]. Finally, the source reader will commit the offset during

Re: Restore from checkpoint

2024-05-17 Thread jiadong.lu
Hi Phil, AFAIK, the error indicated your path was incorrect. your should use '/opt/flink/checkpoints/1875588e19b1d8709ee62be1cdcc' or 'file:///opt/flink/checkpoints/1875588e19b1d8709ee62be1cdcc' instead. Best. Jiadong.Lu On 5/18/24 2:37 AM, Phil Stavridis wrote: Hi, I am trying to test how

Re: [ANNOUNCE] Apache Flink CDC 3.1.0 released

2024-05-17 Thread Muhammet Orazov via user
Amazing, congrats! Thanks for your efforts! Best, Muhammet On 2024-05-17 09:32, Qingsheng Ren wrote: The Apache Flink community is very happy to announce the release of Apache Flink CDC 3.1.0. Apache Flink CDC is a distributed data integration tool for real time data and batch data, bringing

Restore from checkpoint

2024-05-17 Thread Phil Stavridis
Hi, I am trying to test how the checkpoints work for restoring state, but not sure how to run a new instance of a flink job, after I have cancelled it, using the checkpoints which I store in the filesystem of the job manager, e.g. /opt/flink/checkpoints. I have tried passing the checkpoint as

Re: problem with the heartbeat interval feature

2024-05-17 Thread Thomas Peyric
thanks Hongshun for your response ! Le ven. 17 mai 2024 à 07:51, Hongshun Wang a écrit : > Hi Thomas, > > In debezium dos says: For the connector to detect and process events from > a heartbeat table, you must add the table to the PostgreSQL publication > specified by the publication.name >

Re: [ANNOUNCE] Apache Flink CDC 3.1.0 released

2024-05-17 Thread gongzhongqiang
Congratulations ! Thanks for all contributors. Best, Zhongqiang Gong Qingsheng Ren 于 2024年5月17日周五 17:33写道: > The Apache Flink community is very happy to announce the release of > Apache Flink CDC 3.1.0. > > Apache Flink CDC is a distributed data integration tool for real time > data and

Re: [ANNOUNCE] Apache Flink CDC 3.1.0 released

2024-05-17 Thread gongzhongqiang
Congratulations ! Thanks for all contributors. Best, Zhongqiang Gong Qingsheng Ren 于 2024年5月17日周五 17:33写道: > The Apache Flink community is very happy to announce the release of > Apache Flink CDC 3.1.0. > > Apache Flink CDC is a distributed data integration tool for real time > data and

RockDb - Failed to clip DB after initialization - end key comes before start key

2024-05-17 Thread Francesco Leone
Hi, We are facing a new issue related to RockDb when deploying a new version of our job, which is adding 3 more operators. We are using flink 1.17.1 with RockDb on Java 11. We get an exception from another pre-existing operator during its initialization. That operator and the new ones have

Re: [ANNOUNCE] Apache Flink CDC 3.1.0 released

2024-05-17 Thread Hang Ruan
Congratulations! Thanks for the great work. Best, Hang Qingsheng Ren 于2024年5月17日周五 17:33写道: > The Apache Flink community is very happy to announce the release of > Apache Flink CDC 3.1.0. > > Apache Flink CDC is a distributed data integration tool for real time > data and batch data, bringing

Re: [ANNOUNCE] Apache Flink CDC 3.1.0 released

2024-05-17 Thread Hang Ruan
Congratulations! Thanks for the great work. Best, Hang Qingsheng Ren 于2024年5月17日周五 17:33写道: > The Apache Flink community is very happy to announce the release of > Apache Flink CDC 3.1.0. > > Apache Flink CDC is a distributed data integration tool for real time > data and batch data, bringing

Re: [ANNOUNCE] Apache Flink CDC 3.1.0 released

2024-05-17 Thread Leonard Xu
Congratulations ! Thanks Qingsheng for the great work and all contributors involved !! Best, Leonard > 2024年5月17日 下午5:32,Qingsheng Ren 写道: > > The Apache Flink community is very happy to announce the release of > Apache Flink CDC 3.1.0. > > Apache Flink CDC is a distributed data integration

Re: [ANNOUNCE] Apache Flink CDC 3.1.0 released

2024-05-17 Thread Leonard Xu
Congratulations ! Thanks Qingsheng for the great work and all contributors involved !! Best, Leonard > 2024年5月17日 下午5:32,Qingsheng Ren 写道: > > The Apache Flink community is very happy to announce the release of > Apache Flink CDC 3.1.0. > > Apache Flink CDC is a distributed data integration

[ANNOUNCE] Apache Flink CDC 3.1.0 released

2024-05-17 Thread Qingsheng Ren
The Apache Flink community is very happy to announce the release of Apache Flink CDC 3.1.0. Apache Flink CDC is a distributed data integration tool for real time data and batch data, bringing the simplicity and elegance of data integration via YAML to describe the data movement and transformation

[ANNOUNCE] Apache Flink CDC 3.1.0 released

2024-05-17 Thread Qingsheng Ren
The Apache Flink community is very happy to announce the release of Apache Flink CDC 3.1.0. Apache Flink CDC is a distributed data integration tool for real time data and batch data, bringing the simplicity and elegance of data integration via YAML to describe the data movement and transformation

Re: Get access to unmatching events in Apache Flink Cep

2024-05-17 Thread Anton Sidorov
Ok, thanks for the reply. пт, 17 мая 2024 г. в 09:22, Biao Geng : > Hi Anton, > > I am afraid that currently there is no such API to access the middle NFA > state in your case. For patterns that contain 'within()' condition, the > timeout events could be retrieved via TimedOutPartialMatchHandler

Re: SSL Kafka PyFlink

2024-05-17 Thread Evgeniy Lyutikov via user
Hi Phil You need specify keystore with CA location [1] [1] https://nightlies.apache.org/flink/flink-docs-release-1.19/docs/connectors/table/kafka/#security От: gongzhongqiang Отправлено: 17 мая 2024 г. 10:44:18 Кому: Phil Stavridis Копия:

Re: What is the best way to aggregate data over a long window

2024-05-17 Thread Sachin Mittal
Hi, I am doing the following 1. Use reduce function where the data type of output after windowing is the same as the input. 2. Where the output of data type after windowing is different from that of input I use the aggregate function. For example: SingleOutputStreamOperator data =

Re: Re: Re: Flink kafka connector for v 1.19.0

2024-05-17 Thread Niklas Wilcke
Hi Hang, thanks for pointing me to the mail thread. That is indeed interesting. Can we maybe ping someone to get this done? Can I do something about it? Becoming a PMC member might be difficult. :) Are still three PMC votes outstanding? I'm not entirely sure how to properly check who is part

Re: What is the best way to aggregate data over a long window

2024-05-17 Thread gongzhongqiang
Hi Sachin, We can optimize this problem in the following ways: - use org.apache.flink.streaming.api.datastream.WindowedStream#aggregate(org.apache.flink.api.common.functions.AggregateFunction) to reduce number of data - use TTL to clean data which are not need - enble incremental checkpoint -

Re: Get access to unmatching events in Apache Flink Cep

2024-05-17 Thread Biao Geng
Hi Anton, I am afraid that currently there is no such API to access the middle NFA state in your case. For patterns that contain 'within()' condition, the timeout events could be retrieved via TimedOutPartialMatchHandler interface, but other unmatching events would be pruned immediately once they

Re: problem with the heartbeat interval feature

2024-05-16 Thread Hongshun Wang
Hi Thomas, In debezium dos says: For the connector to detect and process events from a heartbeat table, you must add the table to the PostgreSQL publication specified by the publication.name

Re: SSL Kafka PyFlink

2024-05-16 Thread gongzhongqiang
Hi Phil, The kafka configuration keys of ssl maybe not correct. You can refer the kafka document[1] to get the ssl configurations of client. [1] https://kafka.apache.org/documentation/#security_configclients Best, Zhongqiang Gong Phil Stavridis 于2024年5月17日周五 01:44写道: > Hi, > > I have a

Re: [EXTERNAL]Re: Flink kafka connector for v 1.19.0

2024-05-16 Thread Hang Ruan
Hi, Niklas. The kafka connector version 3.2.0[1] is for Flink 1.19 and it has a vote thread[2] already. But there is not enough votes, Best, Hang [1] https://issues.apache.org/jira/browse/FLINK-35138 [2] https://lists.apache.org/thread/7shs2wzb0jkfdyst3mh6d9pn3z1bo93c Niklas Wilcke

Re: monitoring message latency for flink sql app

2024-05-16 Thread Hang Ruan
Hi, mete. As Feng Jin said, I think you could make use of the metric ` currentEmitEventTimeLag`. Besides that, if you develop your job with the DataStream API, you could add a new operator to handle it by yourself. Best, Hang Feng Jin 于2024年5月17日周五 02:44写道: > Hi Mete > > You can refer to the

Re: Flink 1.18.1 ,重启状态恢复

2024-05-16 Thread Yanfei Lei
看起来和 FLINK-34063 / FLINK-33863 是同样的问题,您可以升级到1.18.2 试试看。 [1] https://issues.apache.org/jira/browse/FLINK-33863 [2] https://issues.apache.org/jira/browse/FLINK-34063 陈叶超 于2024年5月16日周四 16:38写道: > > 升级到 flink 1.18.1 ,任务重启状态恢复的话,遇到如下报错: > 2024-04-09 13:03:48 > java.lang.Exception: Exception while

What is the best way to aggregate data over a long window

2024-05-16 Thread Sachin Mittal
Hi, My pipeline step is something like this: SingleOutputStreamOperator reducedData = data .keyBy(new KeySelector()) .window( TumblingEventTimeWindows.of(Time.seconds(secs))) .reduce(new DataReducer()) .name("reduce"); This works fine for secs =

Re: monitoring message latency for flink sql app

2024-05-16 Thread Feng Jin
Hi Mete You can refer to the metrics provided by the Kafka source connector. https://nightlies.apache.org/flink/flink-docs-release-1.19/docs/connectors/datastream/kafka//#monitoring Best, Feng On Thu, May 16, 2024 at 7:55 PM mete wrote: > Hello, > > For an sql application using kafka as

RE: monitoring message latency for flink sql app

2024-05-16 Thread Anton Sidorov
Hello mete. I found this SO article https://stackoverflow.com/questions/54293808/measuring-event-time-latency-with-flink-cep If I'm not mistake, you can use Flink metrics system for operators and get time of processing event in operator. On 2024/05/16 11:54:44 mete wrote: > Hello, > > For an

problem with the heartbeat interval feature

2024-05-16 Thread Thomas Peyric
Hi Flink Community ! I am using : * Flink * Flink CDC posgtres Connector * scala + sbt versions are : * orgApacheKafkaVersion = "3.2.3" * flinkVersion = "1.19.0" * flinkKafkaVersion = "3.0.2-1.18" * flinkConnectorPostgresCdcVersion = "3.0.1" * debeziumVersion = "1.9.8.Final" *

Re: Would Java 11 cause Getting OutOfMemoryError: Direct buffer memory?

2024-05-16 Thread John Smith
Hi. No I have not changed the protocol. On Thu, May 16, 2024, 3:20 AM Biao Geng wrote: > Hi John, > > Just want to check, have you ever changed the kafka protocol in your job > after using the new cluster? The error message shows that it is caused by > the kafka client and there is a similar

Re: [EXTERNAL]Re: Flink kafka connector for v 1.19.0

2024-05-16 Thread Niklas Wilcke
Hi Ahmed, are you aware of a blocker? I'm also a bit confused that after Flink 1.19 being available for a month now the connectors still aren't. It would be great to get some insights or maybe a reference to an issue. From looking at the Github repos and the Jira I wasn't able to spot

monitoring message latency for flink sql app

2024-05-16 Thread mete
Hello, For an sql application using kafka as source (and kafka as sink) what would be the recommended way to monitor for processing delay? For example, i want to be able to alert if the app has a certain delay compared to some event time field in the message. Best, Mete

SSL Kafka PyFlink

2024-05-16 Thread Phil Stavridis
Hi, I have a PyFlink job that needs to read from a Kafka topic and the communication with the Kafka broker requires SSL. I have connected to the Kafka cluster with something like this using just Python. from confluent_kafka import Consumer, KafkaException, KafkaError def

Get access to unmatching events in Apache Flink Cep

2024-05-16 Thread Anton Sidorov
Hello! I have a Flink Job with CEP pattern. Pattern example: // Strict Contiguity // a b+ c d e Pattern.begin("a", AfterMatchSkipStrategy.skipPastLastEvent()).where(...) .next("b").where(...).oneOrMore() .next("c").where(...) .next("d").where(...)

Flink 1.18.1 ,重启状态恢复

2024-05-16 Thread 陈叶超
升级到 flink 1.18.1 ,任务重启状态恢复的话,遇到如下报错: 2024-04-09 13:03:48 java.lang.Exception: Exception while creating StreamOperatorStateContext. at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:258) at

Get access to unmatching events in Apache Flink Cep

2024-05-16 Thread Anton Sidorov
Hello! I have a Flink Job with CEP pattern. Pattern example: // Strict Contiguity // a b+ c d e Pattern.begin("a", AfterMatchSkipStrategy.skipPastLastEvent()).where(...) .next("b").where(...).oneOrMore() .next("c").where(...) .next("d").where(...)

Re: Would Java 11 cause Getting OutOfMemoryError: Direct buffer memory?

2024-05-16 Thread Biao Geng
Hi John, Just want to check, have you ever changed the kafka protocol in your job after using the new cluster? The error message shows that it is caused by the kafka client and there is a similar error in this issue

Would Java 11 cause Getting OutOfMemoryError: Direct buffer memory?

2024-05-15 Thread John Smith
I deployed a new cluster, same version as my old cluster(1.14.4 ), only difference using Java 11 and it seems after a week of usage the below exception happens. The task manager is... 32GB total And i have the ONLY following memory settings taskmanager.memory.flink.size: 16384m

Confirmation on Lambda Support for UDFs in FlinkSQL / Table API

2024-05-15 Thread Tucker Harvey via user
Hi Flink Community, I’m writing to confirm whether lambda expressions are supported with User Defined Functions (UDFs) in FlinkSQL and the Table API. My current understanding is that they are not supported. Can anyone verify this, or let me know if there have been any recent changes regarding

Re:Re: use flink 1.19 JDBC Driver can find jdbc connector

2024-05-15 Thread Xuyang
Hi, > 现在可以用中文了? 我看你发的是中文答疑邮箱 > 就是opt目录里面的gateway.jar直接编辑Factory文件把connector注册就行了 你的意思是,之前报错类似"找不到一个jdbc connector",然后直接在gateway的jar包里的META-INF/services内的Factory文件(SPI文件)内加入jdbc connector的Factory实现类就好了吗? 如果是这个问题就有点奇怪,因为本身flink-connector-jdbc的spi文件就已经将相关的类写进去了[1],按理说放到lib目录下,就会spi发现的

Re:Unable to log any data captured from kafka

2024-05-15 Thread Xuyang
Hi, Nida. I'd like to confirm whether there would be any log output if it's executed directly in the IDE. If there are logs in the IDE but not when running by submission, you could check if the log configuration files in the TM logs are normal. If there are no logs in the IDE either,

Re: use flink 1.19 JDBC Driver can find jdbc connector

2024-05-15 Thread abc15606
现在可以用中文了?就是opt目录里面的gateway.jar直接编辑Factory文件把connector注册就行了 > 在 2024年5月15日,15:36,Xuyang 写道: > > Hi, 看起来你之前的问题是jdbc driver找不到,可以简单描述下你的解决的方法吗?“注册connection数的数量”有点不太好理解。 > > > > > 如果确实有类似的问题、并且通过这种手段解决了的话,可以建一个improvement的jira issue[1]来帮助社区跟踪、改善这个问题,感谢! > > > > > [1]

Re:请问如何贡献Flink Hologres连接器?

2024-05-15 Thread Xuyang
Hi, 我觉得如果只是从贡献的角度来说,支持flink hologres connector是没问题的,hologres目前作为比较热门的数据库,肯定是有很多需求的,并且现在aliyun github官方也基于此提供了开源的flink hologres connector[1]。 但是涉及到aliyun等公司商业化的ververica-connector-hologres包,如果想直接开源的话,在我的角度最好事先确认下面几点,不然可能会隐含一些法律风险 1.

Re:Re: use flink 1.19 JDBC Driver can find jdbc connector

2024-05-15 Thread Xuyang
Hi, 看起来你之前的问题是jdbc driver找不到,可以简单描述下你的解决的方法吗?“注册connection数的数量”有点不太好理解。 如果确实有类似的问题、并且通过这种手段解决了的话,可以建一个improvement的jira issue[1]来帮助社区跟踪、改善这个问题,感谢! [1] https://issues.apache.org/jira/projects/FLINK/summary -- Best! Xuyang 在 2024-05-10 12:26:22,abc15...@163.com 写道: >I've

Unable to log any data captured from kafka

2024-05-15 Thread Fidea Lidea
Hi Team, I've written a flink job & enabled *slf4j* logging mechanism for it. Flow of* Flink Job :* Kafka source => process datastream elements(Transformations) => kafka sink. It stops logging while processing datastream. I want to log all data captured from kafka either in a log file or on the

Re: Job is Failing for every 2hrs - Out of Memory Exception

2024-05-15 Thread Biao Geng
Hi Madan, The error shows that it cannot create new threads. One common reason is that the physical machine does not configure a large enough thread limit(check this SO

Re: how to reduce read times when many jobs read the same kafka topic?

2024-05-14 Thread longfeng Xu
Thanks for you explanation. I'll give it a try. :) Sachin Mittal 于2024年5月15日周三 10:39写道: > Each separate job would have its own consumer group hence they will read > independently from the same topic and when checkpointing they will commit > their own offsets. > So if any job fails, it will not

Re: Checkpointing while loading causing issues

2024-05-14 Thread gongzhongqiang
Hi Lars, Currently, there is no configuration available to trigger a checkpoint immediately after the job starts in Flink. But we can address this issue from multiple perspectives using the insights provided in this document [1]. [1]

Re: how to reduce read times when many jobs read the same kafka topic?

2024-05-14 Thread Sachin Mittal
Each separate job would have its own consumer group hence they will read independently from the same topic and when checkpointing they will commit their own offsets. So if any job fails, it will not affect the progress of other jobs when reading from Kafka. I am not sure of the impact of network

Job is Failing for every 2hrs - Out of Memory Exception

2024-05-14 Thread Madan D via user
Hello Team, Good morning! We have been running a flink job with Kafka  where it gets restarted every 2 hours with an Out of Memory Exception. We tried to increase task manager memory and reduce parallelism and  having rate limit to reduce consumption rate, but irrespectively, it restarts every

<    1   2   3   4   5   6   7   8   9   10   >