[Request Help] flinkcdc start with error java.lang.NoClassDefFoundError: org/apache/flink/cdc/common/sink/MetadataApplier

2024-07-26 Thread 424767284
hi: follow the guide of https://nightlies.apache.org/flink/flink-cdc-docs-release-3.1/docs/get-started/quickstart/mysql-to-doris/ and start with flink-cdc.sh and get an error error java.lang.NoClassDefFoundError: org/apache/flink/cdc/common/sink/MetadataApplier is there anything wrong

Re: Tuning rocksdb configuration

2024-07-26 Thread Zakelly Lan
Hi Banupriya, Sometimes a sst will not be compacted and will be referenced for a long time. That depends on how rocksdb picks the files for compaction. It may happen when some range of keys is never touched at some point of time, since the rocksdb only takes care of the files or key range that

Re: Tuning rocksdb configuration

2024-07-26 Thread Zakelly Lan
Hi Banu, I'm trying to answer your question in brief: 1. Yes, when the memtable reaches the value you configured, a flush will be triggered. And no, sst files have different format with memtables, the size is smaller than 64mb IIUC. 2. Typically you don't need to change this value. If it is set

Access to S3 - checkpoints

2024-07-25 Thread Sigalit Eliazov
Hi, We are using Ceph buckets to store the checkpoints and savepoints, and the access is done via the S3 protocol. Since we don't have any integration with Hadoop, we added a dependency on flink-s3-fs-presto. Our Flink configuration looks like this: state.checkpoint-storage:

Tuning rocksdb configuration

2024-07-25 Thread banu priya
Hi All, I have a flink job with RMQ Source, filters, tumbling window(uses processing time fires every 2s), aggregator, RMQ Sink. Enabled incremental rocksdb checkpoints for every 10s with minimum pause between checkpoints as 5s. My checkpoints size is keep on increasing , so I am planning to tune

Setting web.submit.enable to false doesn't allow flinksessionjobs to work when running in Kubernetes

2024-07-24 Thread Ralph Blaise
Setting web.submit.enable to false in a flinkdeployment deployed to kubernetes doesn't allow flinksessionjobs for it to work. It instead result in the error below:

Re: Troubleshooting checkpoint expiration

2024-07-23 Thread Alexis Sarda-Espinosa
Hi again, I found a Hadoop class that can log latency information [1], but since I don't see any exceptions in the logs when a checkpoint expires due to timeout, I'm still wondering if I can change other log levels to get more insights, maybe somewhere in Flink's file system abstractions? [1]

Re: SavepointReader: Record size is too large for CollectSinkFunction

2024-07-23 Thread Salva Alcántara
Hi all, Just to share my findings so far. Regarding tweaking the setting, it has been impossible for me to do so. So, the only way to work around this has been to duplicate some Flink code directly to allow me to do the tweak. More precisely, this is how my code looks like now (kudos to my dear

Re: flight buffer local storage

2024-07-22 Thread Enric Ott
Thanks,Zhanghao. I think it's the async upload mechanism helped mitigating the in flight buffers materialization latency,and the execution vertex restarting procedure just reads the in flight buffers and the local TaskStateSnapshots to make its job done.

Re: Flink state

2024-07-22 Thread Saurabh Singh
Hi Banu, Rocksdb is intelligently built to clear any un-useful state from its purview. So you should be good and any required cleanup will be automatically done by RocksDb itself. >From the current documentation, it looks quite hard to relate Flink Internal DS activity to RocksDB DS activity. In

Re: SavepointReader: Record size is too large for CollectSinkFunction

2024-07-22 Thread Salva Alcántara
The same happens with this slight variation: ``` Configuration config = new Configuration(); config.setString("collect-sink.batch-size.max", "100mb"); StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); env.configure(config); SavepointReader savepoint =

Re: SavepointReader: Record size is too large for CollectSinkFunction

2024-07-22 Thread Salva Alcántara
Hi Zhanghao, Thanks for your suggestion. Unfortunately, this does not work, I still get the same error message: ``` Record size is too large for CollectSinkFunction. Record size is 9623137 bytes, but max bytes per batch is only 2097152 bytes. Please consider increasing max bytes per batch value

Re: Flink Slot request bulk is not fulfillable!

2024-07-22 Thread Saurabh Singh
Hi Li, The error suggests that Job is not able to acquire the required TaskManager TaskSlots within the configured time duration of 5 minutes. Job Runs on the TaskManagers (Worker Nodes). Helpful Link -

Flink Slot request bulk is not fulfillable!

2024-07-21 Thread Li Shao
Hi All, We are using flink batch mode to process s3 files. However, recently we are seeing the errors like: Caused by: java.util.concurrent.CompletionException: org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Slot request bulk is not fulfillable! Could not allocate

Re: Flink state

2024-07-21 Thread banu priya
Dear Community, Gentle reminder about my below email. Thanks Banu On Sat, 20 Jul, 2024, 4:37 pm banu priya, wrote: > Hi All, > > I have a flink job with RMQ Source, filters, tumbling window(uses > processing time fires every 2s), aggregator, RMQ Sink. > > I am trying to understand about

Re: flight buffer local storage

2024-07-21 Thread Zhanghao Chen
By default, Flink uses aligned checkpoint where we wait for all in-flight data before the barriers to be fully processed and then make the checkpoints. There's no in need to store in-flight buffers in this case at the cost of additional barrier alignment, which may take a long time at the

Re: SavepointReader: Record size is too large for CollectSinkFunction

2024-07-21 Thread Zhanghao Chen
Hi, you could increase it as follows: Configuration config = new Configuration(); config.setString(collect-sink.batch-size.max, "10mb"); StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(config); From: Salva Alcántara Sent:

Flink state

2024-07-20 Thread banu priya
Hi All, I have a flink job with RMQ Source, filters, tumbling window(uses processing time fires every 2s), aggregator, RMQ Sink. I am trying to understand about states and checkpoints(enabled incremental rocksdb checkpoints). In local rocks db directory, I have .sst files, log, lock, options

SavepointReader: Record size is too large for CollectSinkFunction

2024-07-20 Thread Salva Alcántara
Hi all! I'm trying to debug a job via inspecting its savepoints but I'm getting this error message: ``` Caused by: java.lang.RuntimeException: Record size is too large for CollectSinkFunction. Record size is 9627127 bytes, but max bytes per batch is only 2097152 bytes. Please consider increasing

Re: 如何基于FLIP-384扩展对业务数据全链路延时情况的监控

2024-07-19 Thread YH Zhu
退订 Yubin Li 于2024年7月18日周四 14:23写道: > Hi, all > > 目前FLIP-384[1]支持了检查点、任务恢复的trace可观测,但实际业务场景中常需要监测每条业务数据在数据链路的各个节点流转过程中的延时情况,请问有什么好的思路吗 > > [1] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-384%3A+Introduce+TraceReporter+and+use+it+to+create+checkpointing+and+recovery+traces > > Best,

Troubleshooting checkpoint expiration

2024-07-19 Thread Alexis Sarda-Espinosa
Hello, We have a Flink job that uses ABFSS for checkpoints and related state. Lately we see a lot of exceptions due to expiration of checkpoints, and I'm guessing that's an issue in the infrastructure or on Azure's side, but I was wondering if there are Flink/Hadoop Java packages that log

Re: Event de duplication in flink with rabbitmq connector

2024-07-18 Thread Ahmed Hamdy
Yes, The current implementation doesn't leverage transactions on publish like it does for the source on acking and nacking the deliveries, you can raise a ticket to support exactly once RMQSinks within the community or implement the logic yourself. my checkpoints size is increasing. can this

Re: flink 任务运行抛ClassNotFoundException

2024-07-18 Thread Yanquan Lv
你好, 假设 xxx.shade. 是你用于 shade 的前缀。 grep -rn 'org.apache.hudi.com.xx.xx.xxx.A' 和grep -rn 'xxx.shade.org.apache.hudi.com.xx.xx.xxx.A' 出来的结果一致吗? ℡小新的蜡笔不见嘞、 <1515827...@qq.com.invalid> 于2024年7月18日周四 20:14写道: > 您好,感谢您的回复。 > 我理解应该是都做了 shade 处理,我这边用了您的 grep -rn 命令查看了下没问题。而且,这个 >

Re: Event de duplication in flink with rabbitmq connector

2024-07-18 Thread Ahmed Hamdy
Hi Banu, This behavior of source is expected, the guarantee of the RMQSource is exactly once which is achieved by acknowledging envelopes on checkpoints hence the source would never re-read a message after checkpoint even if it was still inside the pipeline and not yet passed to sink, eager

Re: flink 任务运行抛ClassNotFoundException

2024-07-18 Thread Yanquan Lv
你好,这个类被 shade 了,但是调用这个类的其他类可能在不同的 jar 包,没有都被 shade 处理。可以 grep -rn 'org.apache.hudi.com.xx.xx.xxx.A' 看看所有调用这个类的包是不是都做了 shade 处理。 ℡小新的蜡笔不见嘞、 <1515827...@qq.com.invalid> 于2024年7月18日周四 18:31写道: > 请问,Flink 任务运行期间 偶尔会抛出 ClassNotFoundException 异常,这个一般是什么原因,以及怎么解决呢?信息如下: > * 这个类确实存在于 任务Jar 里面 > * 这个类是经过

Read avro files with wildcard

2024-07-18 Thread irakli.keshel...@sony.com
Hi all, I have a Flink application where I need to read in AVRO files from s3 which are partitioned by date and hour. I need to read in multiple dates, meaning I need to read files from multiple folders. Does anyone know how I can do this? My application is written in Scala using Flink 1.17.1.

Re: Event de duplication in flink with rabbitmq connector

2024-07-18 Thread banu priya
Hi All, Gentle reminder about bow query. Thanks Banu On Tue, 9 Jul, 2024, 1:42 pm banu priya, wrote: > Hi All, > > I have a Flink job with a RMQ source, tumbling windows (fires for each > 2s), an aggregator, then a RMQ sink. Incremental RocksDB checkpointing is > enabled with an interval of 5

如何基于FLIP-384扩展对业务数据全链路延时情况的监控

2024-07-18 Thread Yubin Li
Hi, all 目前FLIP-384[1]支持了检查点、任务恢复的trace可观测,但实际业务场景中常需要监测每条业务数据在数据链路的各个节点流转过程中的延时情况,请问有什么好的思路吗 [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-384%3A+Introduce+TraceReporter+and+use+it+to+create+checkpointing+and+recovery+traces Best, Yubin

Re:回复:回复:使用hive的catalog问题

2024-07-17 Thread Xuyang
Hi, 我试了下,flink-connector-kafka-3.2.0-1.19.jar需要替换成flink-sql-connector-kafka-3.2.0-1.19.jar , 下载地址在文档[1]里的sql client那一列下面,这个包里面是有OffsetResetStrategy的。 你能用这个包再试一下吗? [1] https://nightlies.apache.org/flink/flink-docs-release-1.19/docs/connectors/table/kafka/#dependencies -- Best!

Re: Buffer Priority

2024-07-17 Thread Enric Ott
Oh,it's designed for unaligned checkpoint. Thanks,Zhanghao. --Original-- From: "Zhanghao Chen"

Unsubscribe

2024-07-17 Thread Phil Stavridis
Unsubscribe

Re:Event-Driven Window Computation

2024-07-17 Thread Xuyang
Hi, As far as I know, the community currently has no plans to support custom triggers on Flink SQL, because it is difficult to describe triggers using SQL. You can create a jira[1] for it and restart the discussion in dev maillist. [1] https://issues.apache.org/jira/projects/FLINK

Event-Driven Window Computation

2024-07-17 Thread liu ze
Hi, Currently, Flink's windows are based on time (or a fixed number of elements). I want to trigger window computation based on specific events (marked within the data). In the DataStream API, this can be achieved using GlobalWindow and custom triggers, but how can it be done in Flink SQL?

RE: Trying to read a file from S3 with flink on kubernetes

2024-07-17 Thread gwenael . lebarzic
Hello everyone. In fact, the problem was coming from FileSystem.get() : ### val fs = FileSystem.get(hadoopConfig) ### When you want to interact with S3, you need to add a first parameter, before the hadoop config, to specify the filesystem. Something like this : ### val s3uri =

flight buffer local storage

2024-07-17 Thread Enric Ott
Hello,Community: Why doesn't flink store in flight buffers to local disks when it checkpoints? Thanks.

Re: Encountering scala.matchError in Flink 1.18.1 Query

2024-07-17 Thread Norihiro FUKE
Hi Xuyang, Thank you for the information regarding the bug fix. I will proceed with the method of joining input_table and udtf first. Thank you for the suggestion. Best regards, Norihiro Fuke. 2024年7月15日(月) 10:43 Xuyang : > Hi, this is a bug fixed in >

回复:回复:使用hive的catalog问题

2024-07-17 Thread 冯奇
flink1.19,hive3.1.2 使用新参数创建表 CREATE TABLE mykafka (name String, age Int) WITH ( 'connector' = 'kafka', 'topic' = 'test', 'properties.bootstrap.servers' = 'localhost:9092', 'properties.group.id' = 'testGroup', 'scan.startup.mode' = 'earliest-offset', 'format' = 'csv' );

Re: 通过 InputFormatSourceFunction 实现flink 实时读取 ftp 的文件时,获取下一个 split 切片失败,

2024-07-16 Thread YH Zhu
退订 Px New <15701181132mr@gmail.com> 于2024年7月16日周二 22:52写道: > 通过老的API 也就是 InputFormatSourceFunction、InputFormat > 实现了一版,但发现第一批文件(任务启动时也已存在的文件)会正常处理,但我新上传文件后,这里一直为空,有解决思路吗?请问 > > [image: image.png] > > 或者有其他实现 ftp 目录实时读取的实现吗?尽可能满足 > 1. 实时读取 ftp 文件 > 2. 支持持续监测目录及递归子目录与文件3. > 3.

通过 InputFormatSourceFunction 实现flink 实时读取 ftp 的文件时,获取下一个 split 切片失败,

2024-07-16 Thread Px New
通过老的API 也就是 InputFormatSourceFunction、InputFormat 实现了一版,但发现第一批文件(任务启动时也已存在的文件)会正常处理,但我新上传文件后,这里一直为空,有解决思路吗?请问 [image: image.png] 或者有其他实现 ftp 目录实时读取的实现吗?尽可能满足 1. 实时读取 ftp 文件 2. 支持持续监测目录及递归子目录与文件3. 3. 支持并行读取以及大文件的切分 4. 文件种类可能有 json、txt、zip 等,支持读取不同类型文件内的数据 5. 支持断点续传以及状态的保存

Re: 回复:使用hive的catalog问题

2024-07-16 Thread Feng Jin
上面的示例好像使用的旧版本的 kafka connector 参数。 参考文档使用新版本的参数: https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/hive/hive_catalog/#step-4-create-a-kafka-table-with-flink-sql-ddl 需要把 kafka 的 connector [1] 也放入到 lib 目录下。 [1]

Re:回复:使用hive的catalog问题

2024-07-16 Thread Xuyang
lib目录下,需要放置一下flink-sql-connector-hive-3.1.3,这个包是给sql作业用的 -- Best! Xuyang 在 2024-07-16 13:40:23,"冯奇" 写道: >我看了下文档,几个包都在,还有一个单独下载依赖的包flink-sql-connector-hive-3.1.3,不知道是使用这个还是下面的? >// Flink's Hive connector flink-connector-hive_2.12-1.19.1.jar // Hive >dependencies

回复:使用hive的catalog问题

2024-07-15 Thread 冯奇
我看了下文档,几个包都在,还有一个单独下载依赖的包flink-sql-connector-hive-3.1.3,不知道是使用这个还是下面的? // Flink's Hive connector flink-connector-hive_2.12-1.19.1.jar // Hive dependencies hive-exec-3.1.0.jar libfb303-0.9.3.jar // libfb303 is not packed into hive-exec in some versions, need to add it separately // add

Re:使用hive的catalog问题

2024-07-15 Thread Xuyang
Hi, 可以check一下是否将hive sql connector的依赖[1]放入lib目录下或者add jar了吗? [1] https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/hive/overview/ -- Best! Xuyang At 2024-07-15 17:09:45, "冯奇" wrote: >Flink SQL> USE CATALOG myhive; >Flink SQL> CREATE TABLE mykafka

RE: Taskslots usage

2024-07-15 Thread Alexandre KY
Hello, Thank you for you answers, I now understand Flink's behavior. Thank you and best regards, Ky Alexandre De : Aleksandr Pilipenko Envoyé : vendredi 12 juillet 2024 19:42:06 À : Alexandre KY Cc : user Objet : Re: Taskslots usage Hello Alexandre, Flink

使用hive的catalog问题

2024-07-15 Thread 冯奇
Flink SQL> USE CATALOG myhive; Flink SQL> CREATE TABLE mykafka (name String, age Int) WITH ( 'connector.type' = 'kafka', 'connector.version' = 'universal', 'connector.topic' = 'hive_sink', 'connector.properties.bootstrap.servers' = '10.0.15.242:9092', 'format.type' = 'csv', 'update-mode' =

Expose rocksdb options for flush thread.

2024-07-14 Thread Zhongyou Lee
Hellow everyone : Up to now, To adjuest rocksdb flush thread the only way is implement ConfigurableRocksDBOptionsFactory #setMaxBackgroundFlushes by user. I found FLINK-22059 to solve this problem. The pr has never been executed, i want to finish this pr. Can anyone assignee this pr

Re:来自kingdomad的邮件

2024-07-14 Thread 张胜军
R 发自139邮箱 The following is the content of the forwarded email From:kingdomad To:user-zh Date:2024-07-15 09:36:43 Subject:来自kingdomad的邮件 (无)

Re:Encountering scala.matchError in Flink 1.18.1 Query

2024-07-14 Thread Xuyang
Hi, this is a bug fixed in https://github.com/apache/flink/pull/25075/files#diff-4ee2dd065d2b45fb64cacd5977bec6126396cc3b56e72addfe434701ac301efeL405. You can try to join input_table and udtf first, and then use it as the input of window tvf to bypass this bug. -- Best! Xuyang

Re:来自kingdomad的邮件

2024-07-14 Thread kingdomad
-- kingdomad At 2024-07-15 09:36:43, "kingdomad" wrote: >

来自kingdomad的邮件

2024-07-14 Thread kingdomad

回复:Flink Standalone-ZK-HA模式下,CLi任务提交

2024-07-13 Thread love_h1...@126.com
猜测是两个JM同时都在向ZK的rest_service_lock节点上写入自身地址,导致Flink客户端的任务有的提交到了一个JM,另一些任务提交到了另一个JM 通过手动修改ZK节点可以复现上述情况。 无法只通过重启ZK完全复现当时的集群, 不清楚上述情况的根本原因,是否有相似BUG出现 回复的原邮件 | 发件人 | Zhanghao Chen | | 日期 | 2024年07月13日 12:41 | | 收件人 | user-zh@flink.apache.org | | 抄送至 | | | 主题 | Re: Flink

Re: Buffer Priority

2024-07-12 Thread Zhanghao Chen
Hi Enric, It basically means the prioritized buffers can bypass all non-prioritized buffers at the input gate and get processed first. You may refer to https://issues.apache.org/jira/browse/FLINK-19026 for more details where it is firstly introduced. Best, Zhanghao Chen

Re: Flink Standalone-ZK-HA模式下,CLi任务提交

2024-07-12 Thread Zhanghao Chen
从日志看,ZK 集群滚动的时候发生了切主,两个 JM 都先后成为过 Leader,但是并没有同时是 Leader。 Best, Zhanghao Chen From: love_h1...@126.com Sent: Friday, July 12, 2024 17:17 To: user-zh@flink.apache.org Subject: Flink Standalone-ZK-HA模式下,CLi任务提交 版本:Flink 1.11.6版本,Standalone HA模式,ZooKeeper 3.5.8版本

Re: Taskslots usage

2024-07-12 Thread Aleksandr Pilipenko
Hello Alexandre, Flink does not use TaskSlot per each task by default, but rather task slot will hold a slice of the entire pipeline (up to 1 subtasks of each operator, depending on the operator parallelism) [1]. So if your job parallelism is 1 - only a single task slot will be occupied. If you

Re: Taskslots usage

2024-07-12 Thread Saurabh Singh
Hi Ky Alexandre, I would recommend reading this section which explains slot sharing b/w tasks. Link Quote - By default, Flink allows subtasks to share slots even if they are >

Re: Flink reactive deployment on with kubernetes operator

2024-07-11 Thread Enric Ott
Thanks,nacisimsek.I will try your suggestion. --Original-- From: "nacisimsek"

?????? Flink reactive deployment on with kubernetes operator

2024-07-11 Thread Enric Ott
Thanks,Gyula.I agree with you on Autoscaler,and I will try the latest Flink Operator version. ---- ??: "Gyula F??ra"

flink-runtime:1.14.6????????????

2024-07-11 Thread ??????
flinkflink-runtime:1.14.6 2024-07-10 16:48:09.700 WARN [XNIO-1 task-8-SendThread(102.195.8.107:2181)] org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn [1164] Session 0x0 for server 102.195.8.107/102.195.8.107:2181, unexpected

Re: Can we use custom serialization/deserialization for kafka sources and sinks through the table api?

2024-07-11 Thread Kevin Lam via user
Hi Gabriel, You could consider overriding the value.serializer and value.deserializer (and similar for key) in the consumer and producer

Can we use custom serialization/deserialization for kafka sources and sinks through the table api?

2024-07-11 Thread Gabriel Giussi
Reading from a kafka topic with custom serialization/deserialization can be done using a KafkaSource configured with an implementation of KafkaRecordDeserializationSchema, which has access even to kafka headers which are used in my case for checking message integrity. How can we do the same but

Trying to read a file from S3 with flink on kubernetes

2024-07-11 Thread gwenael . lebarzic
Hey guys. I'm trying to read a file from an internal S3 with flink on Kubernetes, but get a strange blocking error. Here is the code : MyFlinkJob.scala : ### package com.example.flink import org.apache.flink.api.common.serialization.SimpleStringSchema import

Re: Flink reactive deployment on with kubernetes operator

2024-07-11 Thread nacisimsek
Hi Enric, You can try using persistent volume claim on your kubernetes cluster as a JobResultStore, instead of using a local path from your underlying host, and see if it works. apiVersion: v1 kind: PersistentVolumeClaim metadata: name: flink-data-pvc spec: resources: requests:

Kubernetes HA checkpoint not retained on termination

2024-07-11 Thread Clemens Valiente
hi, I have a problem that Flink deletes checkpoint information on kubernetes HA setup even if execution.checkpointing.externalized-checkpoint-retention=RETAIN_ON_CANCELLATION is set. config documentation: "RETAIN_ON_CANCELLATION": Checkpoint state is kept when the owning job is cancelled or fails.

回复:Flink在HA模式,重启ZK集群,客户端任务提交异常

2024-07-11 Thread wjw_bigd...@163.com
退订 回复的原邮件 | 发件人 | love_h1...@126.com | | 日期 | 2024年07月11日 16:10 | | 收件人 | user-zh@flink.apache.org | | 抄送至 | | | 主题 | Flink在HA模式,重启ZK集群,客户端任务提交异常 | 问题现象: Flink 1.11.6版本,Standalone HA模式, 滚动重启了ZK集群;在Flink集群的一个节点上使用flink run 命令提交多个任务; 部分任务提交失败,异常信息如下:

Flink在HA模式,重启ZK集群,客户端任务提交异常

2024-07-11 Thread love_h1...@126.com
问题现象: Flink 1.11.6版本,Standalone HA模式, 滚动重启了ZK集群;在Flink集群的一个节点上使用flink run 命令提交多个任务; 部分任务提交失败,异常信息如下: [Flink-DispatcherRestEndpoint-thread-2] - [WARN ] - [org.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.createRpcInvocationMessage(line:290)] - Could not create remote rpc invocation

Re: Flink reactive deployment on with kubernetes operator

2024-07-11 Thread Gyula Fóra
Hi Eric! The community cannot support old versions of the Flink operator, please upgrade to the latest version (1.9.0) Also, we do not recommend using the Reactive mode (with standalone). You should instead try Native Mode + Autoscaler which works much better in most cases. Cheers, Gyula

Flink reactive deployment on with kubernetes operator

2024-07-10 Thread Enric Ott
Hi,Community: I hava encountered a problemwhen deploy reactive flink scheduler on kubernetes with flink kubernetes operator 1.6.0,the manifest and exception stack info listed as follows. Any clues would be appreciated.

RE: Flink Serialisation

2024-07-10 Thread Alexandre KY
After taking a closer look to the logs, I found out it was a `java.lang.OutOfMemoryError: Java heap space` error which confirms what I thought: the serialized object is too big. Here is the solution to increase the JVM heap:

Flink Serialisation

2024-07-10 Thread Alexandre KY
Hello, I was wondering if Flink has a size limit to serialize data. I have an object that stores a big 2D array and when I try to hand it over the next operator, I have the following error: ``` 2024-07-10 10:14:51,983 ERROR org.apache.flink.runtime.util.ClusterUncaughtExceptionHandler [] -

Buffer Priority

2024-07-10 Thread Enric Ott
Hello,Community: I am puzzled by what the Priority means in Flink Buffer,it explains with example(as follows) in Buffer.java,but I still don't get what exactly is "it skipped buffers"??Could anyone give me a intuitive explanation? Thanks. /** Same as EVENT_BUFFER, but the event has been

Re:Re:Flink LAG-Function doesn't work as expected

2024-07-10 Thread Xuyang
Sorry, I mean "could not". -- Best! Xuyang 在 2024-07-10 15:21:48,"Xuyang" 写道: Hi, which Flink version does you use? I could re-produce this bug in master. My test sql is below: ``` CREATE TABLE UNITS_DATA( proctime AS PROCTIME() , `IDENT` INT

Re:Flink LAG-Function doesn't work as expected

2024-07-10 Thread Xuyang
Hi, which Flink version does you use? I could re-produce this bug in master. My test sql is below: ``` CREATE TABLE UNITS_DATA( proctime AS PROCTIME() , `IDENT` INT , `STEPS_ID` INT , `ORDERS_ID` INT ) WITH ( 'connector' = 'datagen',

Event de duplication in flink with rabbitmq connector

2024-07-09 Thread banu priya
Hi All, I have a Flink job with a RMQ source, tumbling windows (fires for each 2s), an aggregator, then a RMQ sink. Incremental RocksDB checkpointing is enabled with an interval of 5 minutes. I was trying to understand Flink failure recovery. My checkpoint X is started, I have sent one event to

Flink LAG-Function doesn't work as expected

2024-07-09 Thread Brandl, Johann
Hi everyone, i'm new to flink and tried some queries with flink sql. Currently I have a problem with the LAG function. I want to emit a new record when the ORDERS_ID changes. To do this, I use the LAG function to detect whether this has changed. However, I noticed that every now and then I

Using BlobServer in FlinkDeployment

2024-07-09 Thread Saransh Jain
Hi all, I am deploying a FlinkDeployment CR in an Operator watched namespace. I have passed these configs in the flinkConfiguration: blob.server.address: "jobmanager" blob.server.port: "6128" blob.storage.directory: "/tmp/jars/" There are a couple of jars that I don't want to make part of the

Encountering scala.matchError in Flink 1.18.1 Query

2024-07-08 Thread Norihiro FUKE
Hi, community I encountered a scala.matchError when trying to obtain the table plan for the following query in Flink 1.18.1. The input data is read from Kafka, and the query is intended to perform a typical WordCount operation. The query is as follows. SPLIT_STRING is a Table Function UDF that

Flink Session jobs goes to reconciling state

2024-07-08 Thread Fidea Lidea
Hi Team, I have a few session jobs for running jars. After creating jobs, the job goes from a running state to a reconciling state or upgrading state. How can I resolve this issue? [image: image.png] Thanks & Regards Nida Shaikh

java.lang.OutOfMemory:null

2024-07-07 Thread 冯路路
Hi Flink任务平稳运行一段时间,资源和数据都很平稳的情况下,一段时间后,忽然在解析json对象时报java.lang.OutOfMemory:null,然后cpu和内存就直线上升,直到完全将资源耗尽,报java.lang.OutOfMemory:java heap space,增加资源后,过一段时候会有同样的问题出现,这是什么原因,如果是内存泄漏,为什么会CPU和内存都完全平稳的运行一段时间,不应该是全程有直线上升的现象吗

Re: Parallelism of state processor jobs

2024-07-06 Thread Alexis Sarda-Espinosa
Hi Junrui, I think you understood correctly. What I'm seeing is that each vertex has a single subtask, but multiple vertices are started in parallel in different slots. That is not a problem in my case, I _want_ to parallelize the work, it's just that this mechanism is very different from

Re: Parallelism of state processor jobs

2024-07-06 Thread Junrui Lee
Hi Alexis, Could you clarify what you mean by "If I add more slots to the task manager, I see the transformations actually start in parallel even though I submit the job with 'flink run -p 1'"? Are you asking if multiple slots are working simultaneously, or if a single JobVertex contains multiple

flinkcdc????postgrep????????checkpoint????????????????????????

2024-07-06 Thread Eleven
PostgresSourceBuilder

Re: Parallelism of state processor jobs

2024-07-06 Thread Alexis Sarda-Espinosa
Hi Junrui, Thanks for the confirmation. I tested some more and I'm seeing a strange behavior. I'm currently testing a single source stream that is fed to 6 identical transformations. The state processor api requires batch mode and, from what I can tell, I must specify a parallelism of 1 in the

Re: Parallelism of state processor jobs

2024-07-05 Thread Junrui Lee
Hi Alexis, For the SavepointWriter, I've briefly looked over the code and the write operation is enforced as non-parallel. Best, Junrui Alexis Sarda-Espinosa 于2024年7月6日周六 01:27写道: > Hi Gabor, > > Thanks for the quick response. What about SavepointWriter? In my case I'm > actually writing a

Re: Parallelism of state processor jobs

2024-07-05 Thread Alexis Sarda-Espinosa
Hi Gabor, Thanks for the quick response. What about SavepointWriter? In my case I'm actually writing a job that will read from an existing savepoint and modify some of its data to write a new one. Regards, Alexis. Am Fr., 5. Juli 2024 um 17:37 Uhr schrieb Gabor Somogyi <

Re: Parallelism of state processor jobs

2024-07-05 Thread Gabor Somogyi
Hi Alexis, It depends. When one uses SavepointLoader to read metadata only then it's non-parallel. SavepointReader however is basically a normal batch job with all its features. G On Fri, Jul 5, 2024 at 5:21 PM Alexis Sarda-Espinosa < sarda.espin...@gmail.com> wrote: > Hello, > > Really quick

Parallelism of state processor jobs

2024-07-05 Thread Alexis Sarda-Espinosa
Hello, Really quick question, when using the state processor API, are all transformations performed in a non-parallel fashion? Regards, Alexis.

Flink State and Filesystem sink

2024-07-05 Thread Alexandre KY
Hello, I am trying to implement a satellite image processing chain. Satellite images are stored as rasters which are heavy, (several GBs) in a FileSystem (I am currently using HDFS for testing purpose but will move on S3 when I'll deploy it on the cloud). So in order to reduce the processing

?????? puzzle on OperatorChain

2024-07-04 Thread Enric Ott
Thanks for your confirmation,Yunfeng. ---- ??: "Yunfeng Zhou"

Re: puzzle on OperatorChain

2024-07-04 Thread Yunfeng Zhou
Hi Enric, Yes that even if there is only one operator, StreamTask will still create an OperatorChain for it. OperatorChain provides an abstract to process events like endInputs, checkpoints and OperatorEvents in a unified way, no matter how may operators are running in the StreamTask. You may

Task manager memory go on increasing on idle stage also

2024-07-04 Thread Ganesh Walse
Hi All, My task manager memory goes on increasing in idle stages also any reason why so. As a result of the above my job is failing. Thanks in advance. Thanks & regards, Ganesh Walse

puzzle on OperatorChain

2024-07-04 Thread Enric Ott
Hello,guys: Does Flink transform all operators(including source operator) to OperatorChain evendisableoperatorchaining was set to true and even the OperatorChain contains only one single Operator. Thanks.

Re: Postgres-CDC start replication fails after stop/start on flink stream

2024-07-04 Thread Yanquan Lv
Hi, David. We've met a similar problem of pg connection, the error message is 'Socket is closed' and we put a lot of effort into investigating, but we couldn't find the reason. Then we modify the publication mode[1] and only subscribe the changes of certain table with following connector options:

Re: watermark and barrier

2024-07-03 Thread Yunfeng Zhou
Hi Enric, OperatorCoordinator is a mechanism allowing subtasks of the same operator to communicate with each other and thus unifying the behavior of subtasks running on different machines. It has mainly been used in source operators to distribute source splits. As for watermarks, there are

Re: [ANNOUNCE] Apache Flink Kubernetes Operator 1.9.0 released

2024-07-03 Thread Peter Huang
Thanks for the effort, Guyla! Best Regards Peter Huang On Wed, Jul 3, 2024 at 12:48 PM Őrhidi Mátyás wrote: > Thank you, Gyula! 拾 > Cheers > On Wed, Jul 3, 2024 at 8:00 AM Gyula Fóra wrote: > > > The Apache Flink community is very happy to announce the release of > Apache > > Flink

Re: [ANNOUNCE] Apache Flink Kubernetes Operator 1.9.0 released

2024-07-03 Thread Őrhidi Mátyás
Thank you, Gyula! 拾 Cheers On Wed, Jul 3, 2024 at 8:00 AM Gyula Fóra wrote: > The Apache Flink community is very happy to announce the release of Apache > Flink Kubernetes Operator 1.9.0. > > The Flink Kubernetes Operator allows users to manage their Apache Flink > applications and their

Re: [ANNOUNCE] Apache Flink Kubernetes Operator 1.9.0 released

2024-07-03 Thread Őrhidi Mátyás
Thank you, Gyula! 拾 Cheers On Wed, Jul 3, 2024 at 8:00 AM Gyula Fóra wrote: > The Apache Flink community is very happy to announce the release of Apache > Flink Kubernetes Operator 1.9.0. > > The Flink Kubernetes Operator allows users to manage their Apache Flink > applications and their

[ANNOUNCE] Apache Flink Kubernetes Operator 1.9.0 released

2024-07-03 Thread Gyula Fóra
The Apache Flink community is very happy to announce the release of Apache Flink Kubernetes Operator 1.9.0. The Flink Kubernetes Operator allows users to manage their Apache Flink applications and their lifecycle through native k8s tooling like kubectl. Release blogpost:

[ANNOUNCE] Apache Flink Kubernetes Operator 1.9.0 released

2024-07-03 Thread Gyula Fóra
The Apache Flink community is very happy to announce the release of Apache Flink Kubernetes Operator 1.9.0. The Flink Kubernetes Operator allows users to manage their Apache Flink applications and their lifecycle through native k8s tooling like kubectl. Release blogpost:

Flink LAG-Function doesn't work as expected

2024-07-03 Thread Brandl, Johann
Hi everyone, i’m new to flink and tried some queries with flink sql. Currently I have a problem with the LAG function. I want to emit a new record when the ORDERS_ID changes. To do this, I use the LAG function to detect whether this has changed. However, I noticed that every now and then I

Re: Flink write ADLS show error: No FileSystem for scheme "file"

2024-07-03 Thread Xiao Xu
Hi, Gabor, I'm curious about why this happened in Azure file and not in other file format(I tried use s3 and it works OK) Gabor Somogyi 于2024年7月2日周二 16:59写道: > I see, thanks for sharing. > > The change what you've made makes sense. Let me explain the details. > Each and every plugin has it's

  1   2   3   4   5   6   7   8   9   10   >