Re: Flink write ADLS show error: No FileSystem for scheme "file"

2024-06-26 Thread Xiao Xu
Hi, Gabar, Thanks to reply, I make sure that not conflict in maven, all the hadoop dependency is in provided scope, and checked my result jar it not contains (src/main/resources/META-INF/services). This is my pom: http://maven.apache.org/POM/4.0.0;

Flink k8s operator starts wrong job config from FlinkSessionJob

2024-06-26 Thread Peter Klauke
Hi all, we're running a session cluster and I submit around 20 jobs to it at the same time by creating FlinkSessionJob Kubernetes resources. After sufficient time there are 20 jobs created and running healthy. However, it appears that some jobs are started with the wrong config. As a result some

Re: Flink write ADLS show error: No FileSystem for scheme "file"

2024-06-26 Thread Gabor Somogyi
Hi Xiao, I'm not quite convinced that the azure plugin ruined your workload, I would take a look at the dependency graph you've in the pom. Adding multiple deps can conflict in terms of class loader services (src/main/resources/META-INF/services). As an example you've 2 such dependencies where

Re: Re:cdc读取oracle数据如何解析

2024-06-26 Thread Yanquan Lv
可以的,通过设置 debezium 的 decimal.handling.mode [1] 参数可以实现你的需求,转成 double 或者 string 来处理。 [1] https://debezium.io/documentation/reference/1.9/connectors/oracle.html#oracle-numeric-types

??????????

2024-06-26 Thread wjw_bigd...@163.com
| ?? | <402987...@qq.com.INVALID> | | | 2024??06??26?? 16:38 | | ?? | user-zh | | ?? | | | | ?? | ---- ??:

回复:退订

2024-06-26 Thread 费文杰
在 2024-06-26 15:07:45,"15868861416" <15868861...@163.com> 写道: >你好,可以把ID和PRICE的类型改为NUMBER试一下,我这边flink-sql试过number类型对应到iceberg的decimal数据是正常的 > > >| | >博星 >| >| >15868861...@163.com >| > > > 回复的原邮件 >| 发件人 | Yanquan Lv | >| 发送日期 | 2024年06月26日 14:46 | >| 收件人 | | >| 主题 | Re:

回复: cdc读取oracle数据如何解析

2024-06-26 Thread 15868861416
你好,可以把ID和PRICE的类型改为NUMBER试一下,我这边flink-sql试过number类型对应到iceberg的decimal数据是正常的 | | 博星 | | 15868861...@163.com | 回复的原邮件 | 发件人 | Yanquan Lv | | 发送日期 | 2024年06月26日 14:46 | | 收件人 | | | 主题 | Re: 回复:cdc读取oracle数据如何解析 | 你好,你的 ID 和 PRINCE 字段类型是 decimal 吗,decimal 默认的展示方式是使用 BASE64 处理

Flink write ADLS show error: No FileSystem for scheme "file"

2024-06-26 Thread Xiao Xu
Hi, all I try to use Flink to write Azure Blob Storage which called ADLS, I put the flink-azure-fs-hadoop jar in plugins directory and when I start my write job it shows: Caused by: org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme "file" at

Re: 回复:cdc读取oracle数据如何解析

2024-06-26 Thread Yanquan Lv
你好,你的 ID 和 PRINCE 字段类型是 decimal 吗,decimal 默认的展示方式是使用 BASE64 处理 可以通过添加下面代码来让展示信息更直观。 Map customConverterConfigs = new HashMap<>(); customConverterConfigs.put(JsonConverterConfig.DECIMAL_FORMAT_CONFIG, "numeric"); JsonDebeziumDeserializationSchema schema = new

Re: [External] Re:Re: Backpressure issue with Flink Sql Job

2024-06-25 Thread Ashish Khatkar via user
Hi Xuyang, The input records are balanced across subtasks, with debloating buffers enabled, the records this subtask receives is less as compared to other subtasks. If the differences among all subtasks are not significant, we might be > encountering an IO bottleneck. In this case, we could try

回复:cdc读取oracle数据如何解析

2024-06-25 Thread wjw_bigd...@163.com
退订真的很麻烦,,,退订了好几次没搞懂 回复的原邮件 | 发件人 | ha.fen...@aisino.com | | 日期 | 2024年06月25日 17:25 | | 收件人 | user-zh | | 抄送至 | | | 主题 | Re: 回复:cdc读取oracle数据如何解析 | 数据没问题 "ID" "NAME" "ADDTIME""PRICE" 1 "aa" 2024-6-25 14:21:33 12.22 发件人: 15868861416 发送时间: 2024-06-25 17:19 收件人:

回复:cdc读取oracle数据如何解析

2024-06-25 Thread 15868861416
检查一下oracle中的DEBEZIUM.PRODUCTS这个表的数据,看你解析中有字符串 | | 博星 | | 15868861...@163.com | 回复的原邮件 | 发件人 | ha.fen...@aisino.com | | 发送日期 | 2024年06月25日 15:54 | | 收件人 | user-zh | | 主题 | cdc读取oracle数据如何解析 | 根据文档的代码 JdbcIncrementalSource oracleChangeEventSource = new OracleSourceBuilder()

?????? flink kubernetes flink autoscale behavior

2024-06-25 Thread Enric Ott
Thanks,Rion.I roughly understand.And I still want to know whether the deployment ofAdaptive Schedulerrelies on Kubernetes? Are there any cases of deploying FlinkAdaptive Scheduleron bare metal machine? Appreciated again. ---- ??:

Flink AsyncWriter如何进行固定速率的限速?这一块似乎有bug

2024-06-25 Thread jinzhuguang
Flink 1.16.0 搜索到社区有相关文章,其中的实例如下: https://flink.apache.org/2022/11/25/optimising-the-throughput-of-async-sinks-using-a-custom-ratelimitingstrategy/#rationale-behind-the-ratelimitingstrategy-interface public class TokenBucketRateLimitingStrategy implements RateLimitingStrategy {

Re: flink kubernetes flink autoscale behavior

2024-06-24 Thread Rion Williams
Hi Eric,I believe you might be referring to use of the adaptive scheduler which should support these “in-place” scaling operations via:jobmanager.scheduler: adaptiveYou can see the documentation for Elastic Scaling here for additional details and configuration.On Jun 24, 2024, at 11:56 PM, Enric

Re:Re: Backpressure issue with Flink Sql Job

2024-06-24 Thread Xuyang
Hi, Ashish. Can you confirm whether, on the subtask label page of this sink materializer node, the input records for each subtask are approximately the same? If the input records for subtask number 5 are significantly larger compared to the others, it signifies a serious data skew, and it

Re: Understanding flink-autoscaler behavior

2024-06-24 Thread Zhanghao Chen
You can try session mode with only one job, but still with adaptive scheduler disabled. When stopping a session job, the TMs won't be released immediately and can be reused later. Best, Zhanghao Chen From: Chetas Joshi Sent: Tuesday, June 25, 2024 1:59 To:

flink kubernetes flink autoscale behavior

2024-06-24 Thread Enric Ott
Hello,Community: I??ve recently started using the Flink Kubernetes Operator,and I'd like to know if CPU and Job Parallelism autoscaling are supported without restarting the whole job,if it??s supported, please tell me how to configure and deploy it. Thanks.

Re: Understanding flink-autoscaler behavior

2024-06-24 Thread Chetas Joshi
Hello, After disabling the adaptive scheduler, I was able to have the operator stop the job with a savepoint, and resume the job from that savepoint after the upgrade. However I observed that the upgrade life cycle is quite slow as it takes down and then brings back up all the task managers. I am

Re: Backpressure issue with Flink Sql Job

2024-06-24 Thread Penny Rastogi
Hi Ashish, Can you check a few things. 1. Is your source broker count also 20 for both topics? 2. You can try increasing the state operation memory and reduce the disk I/O. - - Increase the number of CU resources in a single slot. - Set optimization parameters: -

Backpressure issue with Flink Sql Job

2024-06-24 Thread Ashish Khatkar via user
Hi all, We are facing backpressure in the flink sql job from the sink and the backpressure only comes from a single task. This causes the checkpoint to fail despite enabling unaligned checkpoints and using debloating buffers. We enabled flamegraph and the task spends most of the time doing

Re: Reminder: Help required to fix security vulnerabilities in Flink Docker image

2024-06-23 Thread elakiya udhayanan
Hi Alexis and Gabor , Thanks for your valuable response and suggestions. Will try to work on the suggestions and get back to you if require more details. Thanks, Elakiya On Sun, Jun 23, 2024 at 10:12 PM Gabor Somogyi wrote: > Hi Elakiya, > > I've just double checked the story and seems like

Re: Reminder: Help required to fix security vulnerabilities in Flink Docker image

2024-06-23 Thread Gabor Somogyi
Hi Elakiya, I've just double checked the story and seems like the latest 1.17 gosu release is not vulnerable. Can you please try it out on your side? Alexis has written down how you can bump the docker version locally: ---CUT-HERE--- ENV GOSU_VERSION 1.17 ---CUT-HERE--- Please report back and

Re: Reminder: Help required to fix security vulnerabilities in Flink Docker image

2024-06-21 Thread Alexis Sarda-Espinosa
Hi Elakiya, just to be clear, I'm not a Flink maintainer, but here my 2 cents. I imagine the issues related to Go come from 'gosu', which is installed in the official Flink Docker images. You can see [1] for some thoughts from the gosu maintainer regarding CVEs (and the md file he links).

Reminder: Help required to fix security vulnerabilities in Flink Docker image

2024-06-21 Thread elakiya udhayanan
Hi Team, I would like to remind about the request for the help required to fix the vulnerabilities seen in the Flink Docker image. Any help is appreciated. Thanks in advance. Thanks, Elakiya U On Tue, Jun 18, 2024 at 12:51 PM elakiya udhayanan wrote: > Hi Community, > > In one of our

Re: A way to meter number of deserialization errors

2024-06-21 Thread Ilya Karpov
I guess metering deserialization errors may be done by flink metrics, but now it is missing out of the box. > - I wondered if you might also want to count how many were successfully > parsed in a non-protobuf layer (Dynamic table sort of level) No, this is not a requirement. I have a simple

Re: Elasticsearch 8 - FLINK-26088

2024-06-20 Thread Tauseef Janvekar
Dear Team, I see that the connector version referred to is elasticsearch-3.1.0. But I am not sure from where can I get sample code using this artifact and how to download this artifact. Any help is appreciated. Thanks, Tauseef On Tue, 18 Jun 2024 at 18:55, Tauseef Janvekar wrote: > Dear

回复: 使用hbase连接器插入数据,一个列族下有多列时如何只更新其中一列

2024-06-20 Thread 15868861416
参考这个案例试试: CREATE TEMPORARY TABLE datagen_source ( a INT, b BIGINT, c STRING, `proc_time` AS PROCTIME() ) WITH ( 'connector'='datagen' ); CREATE TEMPORARY TABLE hbase_dim ( rowkey INT, family1 ROW, family2 ROW, family3 ROW ) WITH ( 'connector'='cloudhbase', 'table-name'='',

Question about concurrentRequests in ElasticsearchWriter

2024-06-20 Thread 장석현
Hi, I'm currently working with the ElasticsearchSink class in the Flink Elasticsearch connector. I noticed that in the createBulkProcessor method, setConcurrentRequests(0) is used, which makes the flush() operation blocking. From my understanding, it seems that even if we set

Re: 使用hbase连接器插入数据,一个列族下有多列时如何只更新其中一列

2024-06-19 Thread xiaohui zhang
flink在写入时需要所有DDL中定义的字段都必须被同时写入,不支持sql中只使用部分字段。 如果你确定只需写入部分数据,在DDL中只定义你用到的部分 zboyu0104 于2024年6月14日周五 15:43写道: > 怎么退订 > from 阿里邮箱 > iPhone-- > 发件人:谢县东 > 日 期:2024年06月06日 16:07:05 > 收件人: > 主 题:使用hbase连接器插入数据,一个列族下有多列时如何只更新其中一列 > >

Re: Flink如何做到动态关联join多张维度表中的n张表?

2024-06-19 Thread xiaohui zhang
lookup join可以关联多张维表,但是维表的更新不会触发历史数据刷新。 多维表关联的时候,需要考虑多次关联导致的延迟,以及查询tps对维表数据库的压力。 斗鱼 <1227581...@qq.com.invalid> 于2024年6月19日周三 23:12写道: > 好的,感谢大佬的回复,之前有了解到Flink的Lookup join好像可以实现类似逻辑,只是不知道Lookup join会不会支持多张动态维度表呢? > > > 斗鱼 > 1227581...@qq.com > > > > > > > > >

Re:Re: Checkpoints and windows size

2024-06-19 Thread Feifan Wang
Hi banu: > Not all old sst files are present. Few are removed (i think it is because of > compaction). You are right, rocksdb implement delete a key by insert a entry with null value, the space will be release after compaction. > Now how can I maintain check points size under control??.

Re:Checkpoints and windows size

2024-06-19 Thread Feifan Wang
Hi banu, First of all, it should be noted that the checkpoint interval does not affect the state data live time of the window operator. The life cycle of state data is the same as the life cycle of the tumbling window itself. A checkpoint is a consistent snapshot of the job ( include state

Re: Flink如何做到动态关联join多张维度表中的n张表?

2024-06-19 Thread xiaohui zhang
维表更新后要刷新历史的事实表吗?这个用flink来做的话,几乎不太可能实现,尤其是涉及到多个维表,相当于每次维表又更新了,就要从整个历史数据里面找到关联的数据,重新写入。不管是状态存储,还是更新数据量,需要的资源都太高,无法处理。 在我们目前的实时宽表应用里面,实时表部分一般都是流水类的,取到的维表信息,就应该是业务事实发生时的数据。 维表更新后刷新事实的,一般都是夜间批量再更新。如果有强实时更新需求的,只能在查询时再关联维表取最新值 王旭 于2024年6月16日周日 21:20写道: > 互相交流哈,我们也在做类似的改造 >

Re: Checkpoints and windows size

2024-06-19 Thread banu priya
Hi Wang, Thanks a lot for your reply. Currently I have 2s window and check point interval as 10s. Minimum pass between check point is 5s. What happens is my check points size is growing gradually. I checked the content inside my rocks db local dir and also the shared checkpoints directory.

Re: A way to meter number of deserialization errors

2024-06-19 Thread David Radley
Hi Ilya, I have not got any experience of doing this, but wonder if we could use the Flink Metrics . I wonder: - There could be hook point at that part of the code to discover some custom code that implements the metrics.

Runtime issue while using statefun-datastream v3.3.0

2024-06-19 Thread RAN JIANG
Hi all, We are trying to leverage statefun datastream features. After adding *org.apache.flink:statefun-flink-datastream:3.3.0* in our gradle file, we are experiencing a runtime error like this, *Caused by: java.lang.NoSuchMethodError: ‘com.google.protobuf.Descriptors$FileDescriptor

Re: A way to meter number of deserialization errors

2024-06-19 Thread Ilya Karpov
Does anybody experience the problem of metering deserialization errors? пн, 17 июн. 2024 г. в 14:39, Ilya Karpov : > Hi all, > we are planning to use flink as a connector between kafka and > external systems. We use protobuf as a message format in kafka. If > non-backward compatible changes

Checkpoints and windows size

2024-06-18 Thread banu priya
Hi All, I have a flink job with key by, tumbling window(2sec window time processing time)and aggregator. How often should I run the check point??I don't need the data to be retained after 2s. I want to use incremental check point with rocksdb. Thanks Banupriya

Re: [ANNOUNCE] Apache Flink CDC 3.1.1 released

2024-06-18 Thread Paul Lam
Well done! Thanks a lot for your hard work! Best, Paul Lam > 2024年6月19日 09:47,Leonard Xu 写道: > > Congratulations! Thanks Qingsheng for the release work and all contributors > involved. > > Best, > Leonard > >> 2024年6月18日 下午11:50,Qingsheng Ren 写道: >> >> The Apache Flink community is very

Re: [ANNOUNCE] Apache Flink CDC 3.1.1 released

2024-06-18 Thread Paul Lam
Well done! Thanks a lot for your hard work! Best, Paul Lam > 2024年6月19日 09:47,Leonard Xu 写道: > > Congratulations! Thanks Qingsheng for the release work and all contributors > involved. > > Best, > Leonard > >> 2024年6月18日 下午11:50,Qingsheng Ren 写道: >> >> The Apache Flink community is very

Re: [ANNOUNCE] Apache Flink CDC 3.1.1 released

2024-06-18 Thread Leonard Xu
Congratulations! Thanks Qingsheng for the release work and all contributors involved. Best, Leonard > 2024年6月18日 下午11:50,Qingsheng Ren 写道: > > The Apache Flink community is very happy to announce the release of Apache > Flink CDC 3.1.1. > > Apache Flink CDC is a distributed data integration

Re: [ANNOUNCE] Apache Flink CDC 3.1.1 released

2024-06-18 Thread Leonard Xu
Congratulations! Thanks Qingsheng for the release work and all contributors involved. Best, Leonard > 2024年6月18日 下午11:50,Qingsheng Ren 写道: > > The Apache Flink community is very happy to announce the release of Apache > Flink CDC 3.1.1. > > Apache Flink CDC is a distributed data integration

[ANNOUNCE] Apache Flink CDC 3.1.1 released

2024-06-18 Thread Qingsheng Ren
The Apache Flink community is very happy to announce the release of Apache Flink CDC 3.1.1. Apache Flink CDC is a distributed data integration tool for real time data and batch data, bringing the simplicity and elegance of data integration via YAML to describe the data movement and transformation

[ANNOUNCE] Apache Flink CDC 3.1.1 released

2024-06-18 Thread Qingsheng Ren
The Apache Flink community is very happy to announce the release of Apache Flink CDC 3.1.1. Apache Flink CDC is a distributed data integration tool for real time data and batch data, bringing the simplicity and elegance of data integration via YAML to describe the data movement and transformation

Elasticsearch 8 - FLINK-26088

2024-06-18 Thread Tauseef Janvekar
Dear Team, As per https://issues.apache.org/jira/browse/FLINK-26088, elasticsearch 8 support is already added but I do not see it in any documentation. Also the last version that supports any elasticsearch is 1.17.x. Can I get the steps on how to integrate with elastic 8 and some sample code

[no subject]

2024-06-18 Thread Dat Nguyen Tien

[no subject]

2024-06-18 Thread Dat Nguyen Tien

RE: Problem reading a CSV file with pyflink datastream in k8s with Flink operator

2024-06-18 Thread gwenael . lebarzic
Hello Rob. This workaround works indeed ! Cdt. [Logo Orange] Gwenael Le Barzic De : Robert Young Envoyé : mardi 18 juin 2024 03:54 À : LE BARZIC Gwenael DTSI/SI Cc : user@flink.apache.org Objet : Re: Problem reading a CSV file with pyflink datastream in k8s with

Help required to fix security vulnerabilities in Flink Docker image

2024-06-18 Thread elakiya udhayanan
Hi Community, In one of our applications we are using a Fink Docker image and running Flink as a Kubernetes pod. As per policy, we tried scanning the Docker image for security vulnerabilities using JFrog XRay and we find that there are multiple critical vulnerabilities being reported as seen in

Re: Flink Stateful Functions 3.4

2024-06-17 Thread Zhanghao Chen
There's no active maintenance of the StateFun project since the release of v3.3 in last September by GitHub commit history [1]. So currently, there's no estimate on when v3.4 could be released and which Flink version would be supported. [1]

Re: Problem reading a CSV file with pyflink datastream in k8s with Flink operator

2024-06-17 Thread Robert Young
Hi Gwenael, >From the logs I thought it was a JVM module opens/exports issue, but I found it had a similar issue using a java8 base image too. I think the issue is it's not permitted for PythonCsvUtils to call the package-private constructor of CsvReaderFormat across class loaders. One

Re: Problems with multiple sinks using postgres-cdc connector

2024-06-17 Thread Hongshun Wang
Hi David, In your modified pipeline, just one source from table1 is sufficient, with both sink1 and process2 sharing a single source from process1. However, based on your log, it appears that two sources have been generated. Do you have the execution graph available in the Flink UI? Best,

Flink Stateful Functions 3.4

2024-06-17 Thread L. Jiang
Hi there, Anyone knows which Flink version that Flink Stateful Functions 3.4 is compatible with? https://nightlies.apache.org/flink/flink-statefun-docs-master/docs/deployment/state-bootstrap/ I know Stateful Functions 3.3 is compatible with Flink 1.16.2, and Stateful Functions 3.2 is good with

Re: Problems with multiple sinks using postgres-cdc connector

2024-06-17 Thread David Bryson
These sink share the same source. The pipeline that works looks something like this: table1 -> process1 -> process2 -> sink2 When I change it to this: table1 -> process1 -> process2 -> sink2 `--> sink1 I get the errors described, where it appears that a second

A way to meter number of deserialization errors

2024-06-17 Thread Ilya Karpov
Hi all, we are planning to use flink as a connector between kafka and external systems. We use protobuf as a message format in kafka. If non-backward compatible changes occur we want to skip those messages ('protobuf.ignore-parse-errors' = 'true') but record an error and raise an alert. I didn't

RE: Problem reading a CSV file with pyflink datastream in k8s with Flink operator

2024-06-17 Thread gwenael . lebarzic
Hello everyone. Does someone know how to solve this please ? Cdt. [Logo Orange] Gwenael Le Barzic Ingénieur technique techno BigData Orange/OF/DTSI/SI/DATA-IA/SOLID/CXP Mobile : +33 6 48 70 85 75

Re: Problems with multiple sinks using postgres-cdc connector

2024-06-17 Thread Hongshun Wang
Hi David, > When I add this second sink, the postgres-cdc connector appears to add a second reader from the replication log, but with the same slot name. I don't understand what you mean by adding a second sink. Do they share the same source, or does each have a separate pipeline? If the former

flink checkpoint 延迟的性能问题讨论

2024-06-16 Thread 15868861416
各位大佬, 背景: 实际测试flink读Kafka 数据写入hudi, checkpoint的间隔时间是1min, state.backend分别为filesystem,测试结果如下: 写hudi的checkpoint 的延迟 写iceberg得延迟: 疑问: hudi的checkpoint的文件数据比iceberg要大很多,如何降低flink写hudi的checkpoint的延迟? | | 博星 | | 15868861...@163.com |

??????Flink????????????????join??????????????n??????

2024-06-16 Thread ????
1.left join

??????Flink????????????????join??????????????n??????

2024-06-16 Thread ????
??flink sql apidatastream api?? | ?? | <1227581...@qq.com.INVALID> | | | 2024??06??16?? 20:35 | | ?? | user-zh | | ?? | | | | Flinkjoin??n?? | ??

Flink????????????????join??????????????n??????

2024-06-16 Thread ????
?? 1DWD??KafkaDWD 2Kafka

Re: Exception: Coordinator of operator xxxx does not exist or the job vertex this operator belongs to is not initialized.

2024-06-16 Thread Junrui Lee
Hi Biao, I agree with you that this exception is not very meaningful and can be noisy in the JM logs, especially when running large-scale batch jobs in a session cluster. IIRC, there isn't a current config to filter out or silence such exceptions in batch mode. So I've created a JIRA ticket (

raw issues

2024-06-16 Thread Fokou Toukam, Thierry
hello, i am trying to do vector assembling with flink 1.15 but i have this. How can i solve it please? 2024-06-16 03:47:24 DEBUG Main:114 - Assembled Data Table Schema: root |-- tripId: INT |-- stopId: INT |-- routeId: INT |-- stopSequence: INT |-- speed: DOUBLE |-- currentStatus: INT

Re: Exception: Coordinator of operator xxxx does not exist or the job vertex this operator belongs to is not initialized.

2024-06-15 Thread Geng Biao
Hi Junrui, Thanks for your answer! Since this exception is not very meaningful, is there a solution or a flink config to filter out or silent such exception in batch mode? When I run some large scale batch jobs in a session cluster, it turns out that the JM log will be fulfilled with this

Re: Exception: Coordinator of operator xxxx does not exist or the job vertex this operator belongs to is not initialized.

2024-06-15 Thread Junrui Lee
Hi, This exception is common in batch jobs and is caused by the collect sink attempting to fetch data from the corresponding operator coordinator on the JM based on the operator ID. However, due to the sequential scheduling of batch jobs, if a job vertex has not been initialized yet, the

Exception: Coordinator of operator xxxx does not exist or the job vertex this operator belongs to is not initialized.

2024-06-15 Thread Corin
When I run a batch job using Flink 1.19, I used collect() in the job, and many times the following error appears in the JobManager log: Caused by: org.apache.flink.util.FlinkException: Coordinator of operator does not exist or the job vertex this operator belongs to is not initialized.

Problem reading a CSV file with pyflink datastream in k8s with Flink operator

2024-06-14 Thread gwenael . lebarzic
Hello everyone. I get the following error when trying to read a CSV file with pyflink datastream in a k8s environment using the flink operator. ### File "/opt/myworkdir/myscript.py", line 30, in run_flink_job(myfile) File "/opt/myworkdir/myscript.py", line 21, in run_flink_job

Re: Which base image to use for pyflink on k8s with flink operator ?

2024-06-14 Thread Gunnar Morling
Hey, I ran into some issues with PyFlink on Kubernetes myself a while ago. Blogged about it here, perhaps it's useful: https://www.decodable.co/blog/getting-started-with-pyflink-on-kubernetes Best, --Gunnar Am Fr., 14. Juni 2024 um 20:58 Uhr schrieb Mate Czagany : > Hi, > > You can refer

RE: Which base image to use for pyflink on k8s with flink operator ?

2024-06-14 Thread gwenael . lebarzic
To be sure about that, I can see this in the doc : # install PyFlink COPY apache-flink*.tar.gz / RUN pip3 install /apache-flink-libraries*.tar.gz && pip3 install /apache-flink*.tar.gz Is the result the same than this command below : RUN pip install --no-cache-dir -r requirements.txt With

RE: Which base image to use for pyflink on k8s with flink operator ?

2024-06-14 Thread gwenael . lebarzic
Thank you for your answer, Mate o/ Cdt. [Logo Orange] Gwenael Le Barzic Ingénieur technique techno BigData Orange/OF/DTSI/SI/DATA-IA/SOLID/CXP Mobile : +33 6 48 70 85 75

Re: Which base image to use for pyflink on k8s with flink operator ?

2024-06-14 Thread Mate Czagany
Oops, forgot the links, sorry about that [1] https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/standalone/docker/#using-flink-python-on-docker [2] https://github.com/apache/flink-kubernetes-operator/blob/main/examples/flink-python-example/Dockerfile Mate

Problems with multiple sinks using postgres-cdc connector

2024-06-14 Thread David Bryson
Hi, I have a stream reading from postgres-cdc connector version 3.1.0. I read from two tables: flink.cleaned_migrations public.cleaned I convert the tables into a datastream, do some processing, then write it to a sink at the end of my stream: joined_table_result =

Re: Which base image to use for pyflink on k8s with flink operator ?

2024-06-14 Thread Mate Czagany
Hi, You can refer to the example Dockerfile in the Flink docs [1] and you can also take a look at the example found in the Flink Kubernetes Operator repo [2]. The second Dockerfile won't work because it is missing all Flink libraries if I am not mistaken. Regards, Mate ezt írta (időpont: 2024.

Which base image to use for pyflink on k8s with flink operator ?

2024-06-14 Thread gwenael . lebarzic
Hello everyone. I contact you because I'm encountereing some strange difficulties with pyflink on Kubernetes using the flink operator. So, first thing first, I was wondering which base image should I use for my python image that I will then deploy on my Kubernetes cluster ? Can I use flink

Re:使用hbase连接器插入数据,一个列族下有多列时如何只更新其中一列

2024-06-14 Thread zboyu0104
怎么退订 from 阿里邮箱 iPhone-- 发件人:谢县东 日 期:2024年06月06日 16:07:05 收件人: 主 题:使用hbase连接器插入数据,一个列族下有多列时如何只更新其中一列 各位好: flink版本: 1.13.6 我在使用 flink-connector-hbase 连接器,通过flinkSQL 将数据写入hbase,hbase 建表如下: CREATE TABLE hbase_test_db_test_table_xxd

Re: flink cdc 3.0 schema变更问题

2024-06-13 Thread Yanquan Lv
你好,DataStream 的方式需要设置 includeSchemaChanges(true) 参数,并且设置自定义的 deserializer,参考这个链接[1]。 如果不想使用 json 的方式,希望自定义 deserializer,从 SourceRecord 里提取 ddl 的方式可以参考这个链接[2]提供的方案。 [1]

Re: flink cdc 3.0 schema变更问题

2024-06-12 Thread Xiqian YU
Zapjone 好, 目前的 Schema Evolution 的实现依赖传递 CDC Event 事件的 Pipeline 连接器和框架。如果您希望插入自定义算子逻辑,建议参考 flink-cdc-composer 模块中的 FlinkPipelineComposer 类构建算子链作业的方式,并在其中插入自定义的 Operator 以实现您的业务逻辑。 另外,对于一些简单的处理逻辑,如果能够使用 YAML 作业的 Route(路由)、Transform(变换)功能表述的话,直接编写对应的 YAML 规则会更简单。 祝好! Regards, yux De : zapjone

flink cdc 3.0 schema变更问题

2024-06-12 Thread zapjone
大佬们好: 想请假下,在flink cdc3.0中支持schema变更,但看到是pipeline方式的,因业务问题需要使用datastream进行特殊处理,所以想请假下,在flink cdc 3.0中datastream api中怎么使用schema变更呢?或者相关文档呢?

Re:changelogstream删除问题

2024-06-12 Thread Xuyang
Hi, 你可以试一下用statement set[1],将这个query同时写入到print sink中吗? 在tm日志里可以查看到print sink的结果,看看里面有没有-D类型的数据。如果没有的话,证明是test_changelog源表可能就没有-D的数据;如果有的话,就需要后续进一步排查sink表在ds和sql上的行为差异。 [1] https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/insert/#insert-into-multiple-tables [2]

Re: Understanding flink-autoscaler behavior

2024-06-12 Thread Chetas Joshi
Got it. Thanks! On Wed, Jun 12, 2024 at 6:49 PM Zhanghao Chen wrote: > > Does this mean it won't trigger a checkpoint before scaling up or > scaling down? > > The in-place rescaling won't do that. > > > Do I need the autotuning to be turned on for exactly once processing? > > It suffices to

Re: Understanding flink-autoscaler behavior

2024-06-12 Thread Zhanghao Chen
> Does this mean it won't trigger a checkpoint before scaling up or scaling > down? The in-place rescaling won't do that. > Do I need the autotuning to be turned on for exactly once processing? It suffices to just go back to the full-restart upgrade mode provided by the operator: disable

Re: Understanding flink-autoscaler behavior

2024-06-12 Thread Chetas Joshi
Hi Zhanghao, If I am using the autoscaler (flink-k8s-operator) without enabling auto-tuning, the documentation says it triggers in-place scaling without going through a full job upgrade lifecycle. Does this mean it won't trigger a checkpoint before scaling up or scaling down? I am observing that

Re: Slack Community Invite?

2024-06-12 Thread Ufuk Celebi
FYI This was recently fixed on the website, too. > On 28. May 2024, at 05:05, Alexandre Lemaire wrote: > > Thank you! > >> On May 27, 2024, at 9:35 PM, Junrui Lee wrote: >> >> Hi Alexandre, >> >> You can try this link: >>

Re: Setting uid hash for non-legacy sinks

2024-06-12 Thread Salva Alcántara
{ "emoji": "♥️", "version": 1 }

Re: Setting uid hash for non-legacy sinks

2024-06-12 Thread Gabor Somogyi
Hey Salva, Good to hear your issue has been resolved one way or another! Thanks for confirming that this operator hash trick is working on V2 as well. G On Wed, Jun 12, 2024 at 5:20 AM Salva Alcántara wrote: > Hey Gabor, > > I didn't finally need to keep compatibility with existing

Re: Changing TTL config without breaking state

2024-06-12 Thread xiangyu feng
Hi Salva, Unfortunately, the state is currently incompatible between enabling TTL and disabling TTL. This issue is tracked by this jira( https://issues.apache.org/jira/browse/FLINK-32955) and not resolved yet. Usually, we need to find a way to reaccumulate the state after enabled the state ttl.

Changing TTL config without breaking state

2024-06-11 Thread Salva Alcántara
I have some jobs where I can configure the TTL duration for certain operator state. The problem I'm noticing is that when I make changes in the TTL configuration the new state descriptor becomes incompatible and I cannot restart my jobs from current savepoints. Is that expected? More precisely,

Re: Force to commit kafka offset when stop a job.

2024-06-11 Thread Lei Wang
I tried, seems the offset will not be committed when doing savepoint After submitting a flink job, I just use the following cmd to see the committed offset bin/kafka-consumer-groups.sh --bootstrap-server xxx:9092 --describe --group groupName TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG

[ANNOUNCE] Apache flink-connector-opensearch 2.0.0 released

2024-06-11 Thread Sergey Nuyanzin
The Apache Flink community is very happy to announce the release of Apache flink-connector-opensearch 2.0.0. Apache Flink® is an open-source stream processing framework for distributed, high-performing, always-available, and accurate data streaming applications. The release is available for

[ANNOUNCE] Apache flink-connector-opensearch 1.2.0 released

2024-06-11 Thread Sergey Nuyanzin
The Apache Flink community is very happy to announce the release of Apache flink-connector-opensearch 1.2.0 for Flink 1.18 and Flink 1.19. Apache Flink® is an open-source stream processing framework for distributed, high-performing, always-available, and accurate data streaming applications. The

Re: Failed to resume from HA when the checkpoint has been deleted.

2024-06-11 Thread Zhanghao Chen
There's no such option yet. However, it might not be a good idea to silently ignore the exception and restart from fresh state which violates the data integrity. Instead, the job should be marked as terminally failed in this case (maybe after a few retries) and just leave users or an external

Re: Failed to resume from HA when the checkpoint has been deleted.

2024-06-11 Thread Jean-Marc Paulin
Thanks for you reply, Yes, this is indeed an option. But I was more after a config option to handle that scenario. If the HA metadata points to a checkpoint that is obviously not present (err 404 in the S3 case) there is little value in retrying. The HA data are obviously worthless in that

Re: Failed to resume from HA when the checkpoint has been deleted.

2024-06-10 Thread Zhanghao Chen
Hi, In this case, you could cancel the job using the flink stop​ command, which will clean up Flink HA metadata, and resubmit the job. Best, Zhanghao Chen From: Jean-Marc Paulin Sent: Monday, June 10, 2024 18:53 To: user@flink.apache.org Subject: Failed to

changelogstream删除问题

2024-06-10 Thread zapjone
大佬们好: 我使用datastream api进行实时读取mysql数据时,通过tableEnv.fromChangelogStream将datastram转换成了变更表,在使用sql将变更表数据写入数据湖中。 但经过测试,insert、update都可以正常实现,但delete无法实现删除操作,使用sql进行测试时,可以实现删除操作。(因有些逻辑需要api操作,就没有使用sql方式实现)。 代码: StreamExecutionEnvironment env = ...; StreamTableEnvironment tableEnv = ...; MySqlSource

Failed to resume from HA when the checkpoint has been deleted.

2024-06-10 Thread Jean-Marc Paulin
Hi, We have a 1.19 Flink streaming job, with HA enabled (ZooKeeper), checkpoint/savepoint in S3. We had an outage and now the jobmanager keeps restarting. We think it because it read the job id to be restarted from ZooKeeper, but because we lost our S3 Storage as part of the outage it cannot

Re: Setting uid hash for non-legacy sinks

2024-06-10 Thread Gabor Somogyi
YW, ping me back whether it works because it's a nifty feature. G On Mon, Jun 10, 2024 at 9:26 AM Salva Alcántara wrote: > Thanks Gabor, I will give it a try! > > On Mon, Jun 10, 2024 at 12:01 AM Gabor Somogyi > wrote: > >> Now I see the intention and then you must have a V2 sink, right?

Re: Setting uid hash for non-legacy sinks

2024-06-10 Thread Salva Alcántara
Thanks Gabor, I will give it a try! On Mon, Jun 10, 2024 at 12:01 AM Gabor Somogyi wrote: > Now I see the intention and then you must have a V2 sink, right? Maybe you > look for the following: > > final String writerHash = "f6b178ce445dc3ffaa06bad27a51fead"; > final String committerHash =

Re: How to Use AsyncLookupFunction

2024-06-10 Thread Krzysztof Chmielewski
Hi there Clenens. This one might be a little bit tricky. You can take a look: at https://github.com/getindata/flink-http-connector/blob/main/src/main/java/com/getindata/connectors/http/internal/table/lookup/AsyncHttpTableLookupFunction.java We implemented our http connector using assuncLookup

How to Use AsyncLookupFunction

2024-06-09 Thread Clemens Valiente
hi, how do I implement AsyncLookupFunctions correctly? I implemented a AsyncLookupFunction, the eval function has the following signature: https://nightlies.apache.org/flink/flink-docs-release-1.17/api/java/org/apache/flink/table/functions/AsyncLookupFunction.html eval(CompletableFuture> future,

  1   2   3   4   5   6   7   8   9   10   >