Re: Flink 1.20.0 missing linux/amd64 images

2024-08-04 Thread Bjarke Tornager
Hi Weijie, Thanks for looking into this. It looks like the docker-hub repo was updated with the linux/arm64 version so I am able to pull it now. Best regards, Bjarke On Fri, Aug 2, 2024 at 3:00 PM weijie guo wrote: > Hi, > > Thanks for noticing us. Actually `amd64` architectures already in

Re: Using state processor for a custom windowed aggregate function

2024-08-02 Thread Alexis Sarda-Espinosa
Hi Matthias, Thank you for looking into this. That change makes the example work, but my real job still has issues. There is a key difference that might be causing the problem, but that's not so easy to replicate in the example I made. Essentially, I'm trying to modify the partition key of an

Re: Flink 1.20.0 missing linux/amd64 images

2024-08-02 Thread weijie guo
Hi, Thanks for noticing us. Actually `amd64` architectures already in the metadata file in flink-docker image repo, but it's not available in official docker-hub repo. This is uploaded by docker official, I will reach out them asap. Best regards, Weijie Bjarke Tornager 于2024年8月2日周五

RE: Using state processor for a custom windowed aggregate function

2024-08-02 Thread Schwalbe Matthias
Hi Alexis, I've worked it out: The input of your com.test.Application.StateReader#readWindow(..., Iterable elements, ...) is of the projection type com.test.Application.AggregateFunctionForMigration: AggregateFunction<..., OUT = GenericService>. I.e. you need to implement

Re: S3 schema for jar location?

2024-08-02 Thread Ahmed Hamdy
Hi Maxim, You need to add the s3 filesystem in the Flink plugins directory in the operator to be able to work with S3, this is similar to any other Filesystem and similar to how Flink itself works. Flink offers 2 S3 filesystem implementations - flink-s3-fs-hadoop[1] for extension s3a://*** -

Flink 1.20.0 missing linux/amd64 images

2024-08-02 Thread Bjarke Tornager
Hi, It looks like no Flink 1.20.0 linux/amd64 images are published on the official Flink Dockerhub repo, however the linux/arm64 images are available. Will you publish the images for linux/amd64 as well? Best regards, Bjarke Tornager

Re: Checkpoint failures due to other subtasks sharing the ChannelState file (Caused the Job to Stall)

2024-08-02 Thread Dhruv Patel
We have also enabled unaligned checkpoints. Could it be because of that? We were experience slowness and intermittent packet loss when this issue occurred. On Wed, Jul 31, 2024 at 7:43 PM Dhruv Patel wrote: > Hi Everyone, > > We are observing an interesting issue with continuous checkpoint

Re:Re: Changing GC settings of Taskmanager

2024-08-01 Thread Xuyang
Hi, Banu. In your description, the only stateful node in the entire job is the sliding window. The fluctuations in the size of the state will be affected by the window size (2 seconds). Theoretically, after the window is triggered, its state can be immediately cleared, thus its size will be

S3 schema for jar location?

2024-08-01 Thread Maxim Senin via user
When will Flink Operator support schemas other than `local` for application deployment jar files? I just tried flink operator 1.9 and it’s still not working with `s3` locations. If s3 is good for savepoints and checkpoints, why can’t the jar also be on s3? Thanks, Maxim

?????? checkpoint upload thread

2024-08-01 Thread Enric Ott
Thanks for the clarification,Yanfei.And I will dig it deeper later. ---- ??: "Yanfei Lei"

Re: checkpoint upload thread

2024-08-01 Thread Yanfei Lei
Hi Enric, Sorry for the confusion, I mean "It can be done theoretically, and it depends on the specific implementation of the file system client in fact." I think there are two ways to let different tasks share a connection (In other words: "socket"): 1. Share one *Output Stream*; 2. Use

回复: flink on yarn 模式,在jar任务中,怎么获取rest port

2024-08-01 Thread wjw_bigdata
退订 回复的原邮件 | 发件人 | Lei Wang | | 发送日期 | 2024年8月1日 14:08 | | 收件人 | | | 主题 | Re: flink on yarn 模式,在jar任务中,怎么获取rest port | 在 flink-conf.yaml 中可以指定 rest.port, 可指定一个范围 On Wed, Jul 31, 2024 at 8:44 PM melin li wrote: flink on yarn 模式, rest port 是随机的,需要获取rest port,有什么好办法?

回复:请问BackPressure是否影响Watermarks机制

2024-08-01 Thread wjw_bigdata
退订 回复的原邮件 | 发件人 | wjw_bigdata | | 发送日期 | 2024年8月1日 14:22 | | 收件人 | user-zh@flink.apache.org | | 主题 | 回复:请问BackPressure是否影响Watermarks机制 | 退订 回复的原邮件 | 发件人 | wjw_bigdata | | 发送日期 | 2024年8月1日 14:21 | | 收件人 | user-zh@flink.apache.org | | 抄送人 | user-zh@flink.apache.org |

回复:请问BackPressure是否影响Watermarks机制

2024-08-01 Thread wjw_bigdata
退订 回复的原邮件 | 发件人 | wjw_bigdata | | 发送日期 | 2024年8月1日 14:21 | | 收件人 | user-zh@flink.apache.org | | 抄送人 | user-zh@flink.apache.org | | 主题 | 回复:请问BackPressure是否影响Watermarks机制 | 退订 回复的原邮件 | 发件人 | jwfktv | | 发送日期 | 2024年8月1日 14:20 | | 收件人 | user-zh@flink.apache.org | |

回复:请问BackPressure是否影响Watermarks机制

2024-08-01 Thread wjw_bigdata
退订 回复的原邮件 | 发件人 | jwfktv | | 发送日期 | 2024年8月1日 14:20 | | 收件人 | user-zh@flink.apache.org | | 主题 | 请问BackPressure是否影响Watermarks机制 | Hi: 各位好,最近使用PyFlink 1.18.0的Window deduplication能力来获取每分钟最后一条记录,输出到数据库中。数据来源是kafka,事件时间使用的是kafka记录的metadata中的timestamp字段。

请问BackPressure是否影响Watermarks机制

2024-08-01 Thread jwfktv
Hi: 各位好,最近使用PyFlink 1.18.0的Window deduplication能力来获取每分钟最后一条记录,输出到数据库中。数据来源是kafka,事件时间使用的是kafka记录的metadata中的timestamp字段。 日常计算没有问题,但是一旦我想要回拉当天数据,就会产生一些奇怪的问题。比如我有一个topic,只有一个partition,每天大概8千万-1亿数据,我的任务设置成1个parallelism,使用flink sql实现任务。

请问BackPressure是否影响Watermarks机制

2024-08-01 Thread jwfktv
Hi: 各位好,最近使用PyFlink 1.18.0的Window deduplication能力来获取每分钟最后一条记录,输出到数据库中。数据来源是kafka,事件时间使用的是kafka记录的metadata中的timestamp字段。 日常计算没有问题,但是一旦我想要回拉当天数据,就会产生一些奇怪的问题。比如我有一个topic,只有一个partition,每天大概8千万-1亿数据,我的任务设置成1个parallelism,使用flink sql实现任务。

Re: flink on yarn 模式,在jar任务中,怎么获取rest port

2024-08-01 Thread Lei Wang
在 flink-conf.yaml 中可以指定 rest.port, 可指定一个范围 On Wed, Jul 31, 2024 at 8:44 PM melin li wrote: > flink on yarn 模式, rest port 是随机的,需要获取rest port,有什么好办法? >

Checkpoint failures due to other subtasks sharing the ChannelState file (Caused the Job to Stall)

2024-07-31 Thread Dhruv Patel
Hi Everyone, We are observing an interesting issue with continuous checkpoint failures in our job causing the event to not be forwarded through the pipeline. We saw a spam of the below log in all our task manager instances. Caused by: org.apache.flink.runtime.checkpoint.CheckpointException:

?????? checkpoint upload thread

2024-07-31 Thread Enric Ott
Hi,Yanfei: What do you mean by using the word possible in statment it is possible to use the same connection for an operator chain? Meaningable to be done but not applied in fact? Or actually applied but with appliedprobability? Thanks. ---- ??:

Re: KafkaSink self-defined RoundRobinPartitioner not able to discover new partitions

2024-07-31 Thread Lei Wang
Seems i needn't to define a FlinkRoundRoubinPartitioner and just use the RoundRobinPartitioner suppllied in kafka: props.setProperty(ProducerConfig.PARTITIONER_CLASS_CONFIG, RoundRobinPartitioner.class.getName()); In this way the new partitions will be found dynamically. Howerever, there's a bug

Re: [Request Help] Flink StreamRecord granularity latency metrics

2024-07-31 Thread Lei Wang
Hi Yubin, We implement it in this manner. For every record, we define several time fields. When the record first enters the system, set one field to current time. After several complex calculation operator, set another field to currentTime. Just calculate the difference between the two values.

StreamTaskException: Could not serialize object for key serializedUDF

2024-07-31 Thread dz902
Hi, I'm using Flink 1.17.1 streaming API, on YARN. My app first stuck at process func serialization. I know Avro Schema is not serializable so I removed all references from my process functions. Now it passes first round, but stuck again at the following error:

Re: Elasticsearch 8.x Connector in Maven

2024-07-31 Thread Rion Williams
Hi Ahmed,Thanks for the response, I’ll reach out to the devs list and go from there.Thanks again,Rion On Jul 31, 2024, at 9:18 AM, Ahmed Hamdy wrote:Hi RionIt seems that ES 8 was supported ahead of 3.1 release[1], which seems to not be released yet hence not published to maven.Given the

Re: Elasticsearch 8.x Connector in Maven

2024-07-31 Thread Ahmed Hamdy
Hi Rion It seems that ES 8 was supported ahead of 3.1 release[1], which seems to not be released yet hence not published to maven. Given the importance of ES 8 and the fact that elastic search still depends on Flink 1.18 while we are releasing 1.20, I would suggest nudging the dev list[2] for a

Re: Using state processor for a custom windowed aggregate function

2024-07-31 Thread Alexis Sarda-Espinosa
Hi again, I realized it's easy to create a reproducible example, see this specific commit: https://github.com/asardaes/test-flink-state-processor/commit/95e65f88fd1e38bcba63ebca68e3128789c0d2f2 When I run that application, I see the following output: Savepoint created

Re: Using state processor for a custom windowed aggregate function

2024-07-31 Thread Alexis Sarda-Espinosa
Hi Matthias, This indeed compiles, I am able to actually generate a savepoint, it's just that all the window states in that savepoint appear to be null. The second argument of withOperator(...) is specified via OperatorTransformation...aggregate(), so the final transformation is built by

flink on yarn 模式,在jar任务中,怎么获取rest port

2024-07-31 Thread melin li
flink on yarn 模式, rest port 是随机的,需要获取rest port,有什么好办法?

Re: Elasticsearch 8.x Connector in Maven

2024-07-31 Thread Rion Williams
Hi again all, Just following up on this as I’ve scoured around trying to find any documentation for using the ES 8.x connector, however everything only appears to reference 6/7. The ES 8.x seems to have been released for quite a bit of time, so I’m curious how others are using it. I’d really

RE: Using state processor for a custom windowed aggregate function

2024-07-31 Thread Schwalbe Matthias
Hi Alexis, Just a couple of points to double-check: * Does your code compile? (the second argument of withOperator(..) should derive StateBootstrapTransformation instead of SingleOutputStreamOperator) * From the documentation of savepoint API you’ll find examples for each type of state

[Request Help] Flink StreamRecord granularity latency metrics

2024-07-31 Thread Yubin Li
Hi everyone, We are focusing on improving observability for Flink, we have a vision to make the latency of every business stream record observable, is there any idea to implement the feature? looking forward to your suggestions!

Re:Changing GC settings of Taskmanager

2024-07-30 Thread Xuyang
Hi, Banu. Could you check whether the "Configuration" icon under the "Task Managers" and "Job Manager" buttons on the left side of the Flink-UI shows that the currently effective flink conf includes these JVM changes? I suspect that you are using a session cluster mode, where changes to the

Elasticsearch 8.x Connector in Maven

2024-07-30 Thread Rion Williams
Hi all, I see that the Elasticsearch Connector for 8.x is supported per the repo (and completed JIRAs). Is there a way to reference this via Maven? Or is it required to build the connector from the source directly? We recently upgraded an Elasticsearch cluster to 8.x and some of the writes are

Re: checkpoint upload thread

2024-07-30 Thread Yanfei Lei
Hi Enric, If I understand correctly, one subtask would use one `asyncOperationsThreadPool`[1,2], it is possible to use the same connection for an operator chain. [1]

KafkaSink self-defined RoundRobinPartitioner not able to discover new partitions

2024-07-30 Thread Lei Wang
I wrote a FlinkRoundRobinPartitioner extends FlinkKafkaPartitioner and use it as following: KafkaSink kafkaSink = KafkaSink.builder() .setBootstrapServers(sinkServers).setKafkaProducerConfig(props) .setDeliverGuarantee(DeliveryGuarantee.AT_LEAST_ONCE)

Re: custom metric reporter

2024-07-30 Thread Dominik.Buenzli
Hi Sigalit Did you add the classpath to the META-INF.services folder of the reporter? [cid:image001.png@01DAE256.1C0ADDF0] The content of my file is: org.apache.flink.metrics.custom.NestedGaugePrometheusReporterFactory Kind regards Dominik From: Sigalit Eliazov Date: Monday, 29 July 2024

checkpoint upload thread

2024-07-29 Thread Enric Ott
Hi,Community: Does Flink upload states and inflight buffers within the same opratorchain using the same connection (instead of per connection per operator)?

Re: [Request Help] flinkcdc start with errorjava.lang.NoClassDefFoundError:org/apache/flink/cdc/common/sink/MetadataApplier

2024-07-29 Thread 424767284
Hi Xiqian, Thanks,Xiqian.I checked the Flink Home /lib directory and found i putflink-cdc-pipeline-connector-mysql-3.1.jar,flink-cdc-pipeline-connector-doris-3.1.jarin /lib directory. I remove these two and go right.I think maybe jar conflicts. I had use flink-cdc 3.0 and it go right Regards,

custom metric reporter

2024-07-29 Thread Sigalit Eliazov
hi we have upgraded from flink 1.16 to 1.18 and our custom metric stopped working. i saw in the release note of 1.17 that there was a change so i have defined the following metrics.reporters: otlp metrics.reporter.otlp.factory.class: xxx.flink.metrics.otlp.OpenTelemetryProtocolReporterFactory

Using state processor for a custom windowed aggregate function

2024-07-29 Thread Alexis Sarda-Espinosa
Hello, I am trying to create state for an aggregate function that is used with a GlobalWindow. This basically looks like: savepointWriter.withOperator( OperatorIdentifier.forUid(UID), OperatorTransformation.bootstrapWith(stateToMigrate) .keyBy(...)

[ANNOUNCE] Apache Celeborn 0.5.1 available

2024-07-29 Thread Ethan Feng
Hello all, Apache Celeborn community is glad to announce the new release of Apache Celeborn 0.5.1. Celeborn is dedicated to improving the efficiency and elasticity of different map-reduce engines and provides an elastic, highly efficient service for intermediate data including shuffle data,

[ANNOUNCE] Apache Celeborn 0.4.2 available

2024-07-29 Thread Fu Chen
Hello all, Apache Celeborn community is glad to announce the new release of Apache Celeborn 0.4.2. Celeborn is dedicated to improving the efficiency and elasticity of different map-reduce engines and provides an elastic, highly efficient service for intermediate data including shuffle data,

Re: Expose rocksdb options for flush thread.

2024-07-29 Thread Gabor Somogyi
This has been not moved for a while so assigned to you. G On Mon, Jul 15, 2024 at 9:06 AM Zhongyou Lee wrote: > Hellow everyone : > > Up to now, To adjuest rocksdb flush thread the only way is implement > ConfigurableRocksDBOptionsFactory #setMaxBackgroundFlushes by user. I found >

Re: SavepointReader: Record size is too large for CollectSinkFunction

2024-07-29 Thread Gabor Somogyi
I've double checked and I think that CollectSinkOperatorFactory is initialized in DataStream.collectAsync without MAX_BATCH_SIZE and SOCKET_TIMEOUT values coming from the Flink config. Could you plz share the whole stacktrace to double check my assumption? G On Tue, Jul 23, 2024 at 12:46 PM

Re: [Request Help] flinkcdc start with error java.lang.NoClassDefFoundError: org/apache/flink/cdc/common/sink/MetadataApplier

2024-07-28 Thread Xiqian YU
Hi Qijun, This error message usually implies that required dependencies were not met. Could you please confirm if you’ve placed all .jar files like this: * Flink Home * /lib * mysql-connector-java-8.0.27.jar * … other pre-bundled jars * Flink CDC Home *

退订

2024-07-28 Thread 戴鹏
退订

[Request Help] flinkcdc start with error java.lang.NoClassDefFoundError: org/apache/flink/cdc/common/sink/MetadataApplier

2024-07-26 Thread 424767284
hi: follow the guide of https://nightlies.apache.org/flink/flink-cdc-docs-release-3.1/docs/get-started/quickstart/mysql-to-doris/ and start with flink-cdc.sh and get an error error java.lang.NoClassDefFoundError: org/apache/flink/cdc/common/sink/MetadataApplier is there anything wrong

Re: Tuning rocksdb configuration

2024-07-26 Thread Zakelly Lan
Hi Banupriya, Sometimes a sst will not be compacted and will be referenced for a long time. That depends on how rocksdb picks the files for compaction. It may happen when some range of keys is never touched at some point of time, since the rocksdb only takes care of the files or key range that

Re: Tuning rocksdb configuration

2024-07-26 Thread Zakelly Lan
Hi Banu, I'm trying to answer your question in brief: 1. Yes, when the memtable reaches the value you configured, a flush will be triggered. And no, sst files have different format with memtables, the size is smaller than 64mb IIUC. 2. Typically you don't need to change this value. If it is set

Access to S3 - checkpoints

2024-07-25 Thread Sigalit Eliazov
Hi, We are using Ceph buckets to store the checkpoints and savepoints, and the access is done via the S3 protocol. Since we don't have any integration with Hadoop, we added a dependency on flink-s3-fs-presto. Our Flink configuration looks like this: state.checkpoint-storage:

Tuning rocksdb configuration

2024-07-25 Thread banu priya
Hi All, I have a flink job with RMQ Source, filters, tumbling window(uses processing time fires every 2s), aggregator, RMQ Sink. Enabled incremental rocksdb checkpoints for every 10s with minimum pause between checkpoints as 5s. My checkpoints size is keep on increasing , so I am planning to tune

Setting web.submit.enable to false doesn't allow flinksessionjobs to work when running in Kubernetes

2024-07-24 Thread Ralph Blaise
Setting web.submit.enable to false in a flinkdeployment deployed to kubernetes doesn't allow flinksessionjobs for it to work. It instead result in the error below:

Re: Troubleshooting checkpoint expiration

2024-07-23 Thread Alexis Sarda-Espinosa
Hi again, I found a Hadoop class that can log latency information [1], but since I don't see any exceptions in the logs when a checkpoint expires due to timeout, I'm still wondering if I can change other log levels to get more insights, maybe somewhere in Flink's file system abstractions? [1]

Re: SavepointReader: Record size is too large for CollectSinkFunction

2024-07-23 Thread Salva Alcántara
Hi all, Just to share my findings so far. Regarding tweaking the setting, it has been impossible for me to do so. So, the only way to work around this has been to duplicate some Flink code directly to allow me to do the tweak. More precisely, this is how my code looks like now (kudos to my dear

Re: flight buffer local storage

2024-07-22 Thread Enric Ott
Thanks,Zhanghao. I think it's the async upload mechanism helped mitigating the in flight buffers materialization latency,and the execution vertex restarting procedure just reads the in flight buffers and the local TaskStateSnapshots to make its job done.

Re: Flink state

2024-07-22 Thread Saurabh Singh
Hi Banu, Rocksdb is intelligently built to clear any un-useful state from its purview. So you should be good and any required cleanup will be automatically done by RocksDb itself. >From the current documentation, it looks quite hard to relate Flink Internal DS activity to RocksDB DS activity. In

Re: SavepointReader: Record size is too large for CollectSinkFunction

2024-07-22 Thread Salva Alcántara
The same happens with this slight variation: ``` Configuration config = new Configuration(); config.setString("collect-sink.batch-size.max", "100mb"); StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); env.configure(config); SavepointReader savepoint =

Re: SavepointReader: Record size is too large for CollectSinkFunction

2024-07-22 Thread Salva Alcántara
Hi Zhanghao, Thanks for your suggestion. Unfortunately, this does not work, I still get the same error message: ``` Record size is too large for CollectSinkFunction. Record size is 9623137 bytes, but max bytes per batch is only 2097152 bytes. Please consider increasing max bytes per batch value

Re: Flink Slot request bulk is not fulfillable!

2024-07-22 Thread Saurabh Singh
Hi Li, The error suggests that Job is not able to acquire the required TaskManager TaskSlots within the configured time duration of 5 minutes. Job Runs on the TaskManagers (Worker Nodes). Helpful Link -

Flink Slot request bulk is not fulfillable!

2024-07-21 Thread Li Shao
Hi All, We are using flink batch mode to process s3 files. However, recently we are seeing the errors like: Caused by: java.util.concurrent.CompletionException: org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Slot request bulk is not fulfillable! Could not allocate

Re: Flink state

2024-07-21 Thread banu priya
Dear Community, Gentle reminder about my below email. Thanks Banu On Sat, 20 Jul, 2024, 4:37 pm banu priya, wrote: > Hi All, > > I have a flink job with RMQ Source, filters, tumbling window(uses > processing time fires every 2s), aggregator, RMQ Sink. > > I am trying to understand about

Re: flight buffer local storage

2024-07-21 Thread Zhanghao Chen
By default, Flink uses aligned checkpoint where we wait for all in-flight data before the barriers to be fully processed and then make the checkpoints. There's no in need to store in-flight buffers in this case at the cost of additional barrier alignment, which may take a long time at the

Re: SavepointReader: Record size is too large for CollectSinkFunction

2024-07-21 Thread Zhanghao Chen
Hi, you could increase it as follows: Configuration config = new Configuration(); config.setString(collect-sink.batch-size.max, "10mb"); StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(config); From: Salva Alcántara Sent:

Flink state

2024-07-20 Thread banu priya
Hi All, I have a flink job with RMQ Source, filters, tumbling window(uses processing time fires every 2s), aggregator, RMQ Sink. I am trying to understand about states and checkpoints(enabled incremental rocksdb checkpoints). In local rocks db directory, I have .sst files, log, lock, options

SavepointReader: Record size is too large for CollectSinkFunction

2024-07-20 Thread Salva Alcántara
Hi all! I'm trying to debug a job via inspecting its savepoints but I'm getting this error message: ``` Caused by: java.lang.RuntimeException: Record size is too large for CollectSinkFunction. Record size is 9627127 bytes, but max bytes per batch is only 2097152 bytes. Please consider increasing

Re: 如何基于FLIP-384扩展对业务数据全链路延时情况的监控

2024-07-19 Thread YH Zhu
退订 Yubin Li 于2024年7月18日周四 14:23写道: > Hi, all > > 目前FLIP-384[1]支持了检查点、任务恢复的trace可观测,但实际业务场景中常需要监测每条业务数据在数据链路的各个节点流转过程中的延时情况,请问有什么好的思路吗 > > [1] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-384%3A+Introduce+TraceReporter+and+use+it+to+create+checkpointing+and+recovery+traces > > Best,

Troubleshooting checkpoint expiration

2024-07-19 Thread Alexis Sarda-Espinosa
Hello, We have a Flink job that uses ABFSS for checkpoints and related state. Lately we see a lot of exceptions due to expiration of checkpoints, and I'm guessing that's an issue in the infrastructure or on Azure's side, but I was wondering if there are Flink/Hadoop Java packages that log

Re: Event de duplication in flink with rabbitmq connector

2024-07-18 Thread Ahmed Hamdy
Yes, The current implementation doesn't leverage transactions on publish like it does for the source on acking and nacking the deliveries, you can raise a ticket to support exactly once RMQSinks within the community or implement the logic yourself. my checkpoints size is increasing. can this

Re: flink 任务运行抛ClassNotFoundException

2024-07-18 Thread Yanquan Lv
你好, 假设 xxx.shade. 是你用于 shade 的前缀。 grep -rn 'org.apache.hudi.com.xx.xx.xxx.A' 和grep -rn 'xxx.shade.org.apache.hudi.com.xx.xx.xxx.A' 出来的结果一致吗? ℡小新的蜡笔不见嘞、 <1515827...@qq.com.invalid> 于2024年7月18日周四 20:14写道: > 您好,感谢您的回复。 > 我理解应该是都做了 shade 处理,我这边用了您的 grep -rn 命令查看了下没问题。而且,这个 >

Re: Event de duplication in flink with rabbitmq connector

2024-07-18 Thread Ahmed Hamdy
Hi Banu, This behavior of source is expected, the guarantee of the RMQSource is exactly once which is achieved by acknowledging envelopes on checkpoints hence the source would never re-read a message after checkpoint even if it was still inside the pipeline and not yet passed to sink, eager

Re: flink 任务运行抛ClassNotFoundException

2024-07-18 Thread Yanquan Lv
你好,这个类被 shade 了,但是调用这个类的其他类可能在不同的 jar 包,没有都被 shade 处理。可以 grep -rn 'org.apache.hudi.com.xx.xx.xxx.A' 看看所有调用这个类的包是不是都做了 shade 处理。 ℡小新的蜡笔不见嘞、 <1515827...@qq.com.invalid> 于2024年7月18日周四 18:31写道: > 请问,Flink 任务运行期间 偶尔会抛出 ClassNotFoundException 异常,这个一般是什么原因,以及怎么解决呢?信息如下: > * 这个类确实存在于 任务Jar 里面 > * 这个类是经过

Read avro files with wildcard

2024-07-18 Thread irakli.keshel...@sony.com
Hi all, I have a Flink application where I need to read in AVRO files from s3 which are partitioned by date and hour. I need to read in multiple dates, meaning I need to read files from multiple folders. Does anyone know how I can do this? My application is written in Scala using Flink 1.17.1.

Re: Event de duplication in flink with rabbitmq connector

2024-07-18 Thread banu priya
Hi All, Gentle reminder about bow query. Thanks Banu On Tue, 9 Jul, 2024, 1:42 pm banu priya, wrote: > Hi All, > > I have a Flink job with a RMQ source, tumbling windows (fires for each > 2s), an aggregator, then a RMQ sink. Incremental RocksDB checkpointing is > enabled with an interval of 5

如何基于FLIP-384扩展对业务数据全链路延时情况的监控

2024-07-18 Thread Yubin Li
Hi, all 目前FLIP-384[1]支持了检查点、任务恢复的trace可观测,但实际业务场景中常需要监测每条业务数据在数据链路的各个节点流转过程中的延时情况,请问有什么好的思路吗 [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-384%3A+Introduce+TraceReporter+and+use+it+to+create+checkpointing+and+recovery+traces Best, Yubin

Re:回复:回复:使用hive的catalog问题

2024-07-17 Thread Xuyang
Hi, 我试了下,flink-connector-kafka-3.2.0-1.19.jar需要替换成flink-sql-connector-kafka-3.2.0-1.19.jar , 下载地址在文档[1]里的sql client那一列下面,这个包里面是有OffsetResetStrategy的。 你能用这个包再试一下吗? [1] https://nightlies.apache.org/flink/flink-docs-release-1.19/docs/connectors/table/kafka/#dependencies -- Best!

Re: Buffer Priority

2024-07-17 Thread Enric Ott
Oh,it's designed for unaligned checkpoint. Thanks,Zhanghao. --Original-- From: "Zhanghao Chen"

Unsubscribe

2024-07-17 Thread Phil Stavridis
Unsubscribe

Re:Event-Driven Window Computation

2024-07-17 Thread Xuyang
Hi, As far as I know, the community currently has no plans to support custom triggers on Flink SQL, because it is difficult to describe triggers using SQL. You can create a jira[1] for it and restart the discussion in dev maillist. [1] https://issues.apache.org/jira/projects/FLINK

Event-Driven Window Computation

2024-07-17 Thread liu ze
Hi, Currently, Flink's windows are based on time (or a fixed number of elements). I want to trigger window computation based on specific events (marked within the data). In the DataStream API, this can be achieved using GlobalWindow and custom triggers, but how can it be done in Flink SQL?

RE: Trying to read a file from S3 with flink on kubernetes

2024-07-17 Thread gwenael . lebarzic
Hello everyone. In fact, the problem was coming from FileSystem.get() : ### val fs = FileSystem.get(hadoopConfig) ### When you want to interact with S3, you need to add a first parameter, before the hadoop config, to specify the filesystem. Something like this : ### val s3uri =

flight buffer local storage

2024-07-17 Thread Enric Ott
Hello,Community: Why doesn't flink store in flight buffers to local disks when it checkpoints? Thanks.

Re: Encountering scala.matchError in Flink 1.18.1 Query

2024-07-17 Thread Norihiro FUKE
Hi Xuyang, Thank you for the information regarding the bug fix. I will proceed with the method of joining input_table and udtf first. Thank you for the suggestion. Best regards, Norihiro Fuke. 2024年7月15日(月) 10:43 Xuyang : > Hi, this is a bug fixed in >

回复:回复:使用hive的catalog问题

2024-07-17 Thread 冯奇
flink1.19,hive3.1.2 使用新参数创建表 CREATE TABLE mykafka (name String, age Int) WITH ( 'connector' = 'kafka', 'topic' = 'test', 'properties.bootstrap.servers' = 'localhost:9092', 'properties.group.id' = 'testGroup', 'scan.startup.mode' = 'earliest-offset', 'format' = 'csv' );

Re: 通过 InputFormatSourceFunction 实现flink 实时读取 ftp 的文件时,获取下一个 split 切片失败,

2024-07-16 Thread YH Zhu
退订 Px New <15701181132mr@gmail.com> 于2024年7月16日周二 22:52写道: > 通过老的API 也就是 InputFormatSourceFunction、InputFormat > 实现了一版,但发现第一批文件(任务启动时也已存在的文件)会正常处理,但我新上传文件后,这里一直为空,有解决思路吗?请问 > > [image: image.png] > > 或者有其他实现 ftp 目录实时读取的实现吗?尽可能满足 > 1. 实时读取 ftp 文件 > 2. 支持持续监测目录及递归子目录与文件3. > 3.

通过 InputFormatSourceFunction 实现flink 实时读取 ftp 的文件时,获取下一个 split 切片失败,

2024-07-16 Thread Px New
通过老的API 也就是 InputFormatSourceFunction、InputFormat 实现了一版,但发现第一批文件(任务启动时也已存在的文件)会正常处理,但我新上传文件后,这里一直为空,有解决思路吗?请问 [image: image.png] 或者有其他实现 ftp 目录实时读取的实现吗?尽可能满足 1. 实时读取 ftp 文件 2. 支持持续监测目录及递归子目录与文件3. 3. 支持并行读取以及大文件的切分 4. 文件种类可能有 json、txt、zip 等,支持读取不同类型文件内的数据 5. 支持断点续传以及状态的保存

Re: 回复:使用hive的catalog问题

2024-07-16 Thread Feng Jin
上面的示例好像使用的旧版本的 kafka connector 参数。 参考文档使用新版本的参数: https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/hive/hive_catalog/#step-4-create-a-kafka-table-with-flink-sql-ddl 需要把 kafka 的 connector [1] 也放入到 lib 目录下。 [1]

Re:回复:使用hive的catalog问题

2024-07-16 Thread Xuyang
lib目录下,需要放置一下flink-sql-connector-hive-3.1.3,这个包是给sql作业用的 -- Best! Xuyang 在 2024-07-16 13:40:23,"冯奇" 写道: >我看了下文档,几个包都在,还有一个单独下载依赖的包flink-sql-connector-hive-3.1.3,不知道是使用这个还是下面的? >// Flink's Hive connector flink-connector-hive_2.12-1.19.1.jar // Hive >dependencies

回复:使用hive的catalog问题

2024-07-15 Thread 冯奇
我看了下文档,几个包都在,还有一个单独下载依赖的包flink-sql-connector-hive-3.1.3,不知道是使用这个还是下面的? // Flink's Hive connector flink-connector-hive_2.12-1.19.1.jar // Hive dependencies hive-exec-3.1.0.jar libfb303-0.9.3.jar // libfb303 is not packed into hive-exec in some versions, need to add it separately // add

Re:使用hive的catalog问题

2024-07-15 Thread Xuyang
Hi, 可以check一下是否将hive sql connector的依赖[1]放入lib目录下或者add jar了吗? [1] https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/hive/overview/ -- Best! Xuyang At 2024-07-15 17:09:45, "冯奇" wrote: >Flink SQL> USE CATALOG myhive; >Flink SQL> CREATE TABLE mykafka

RE: Taskslots usage

2024-07-15 Thread Alexandre KY
Hello, Thank you for you answers, I now understand Flink's behavior. Thank you and best regards, Ky Alexandre De : Aleksandr Pilipenko Envoyé : vendredi 12 juillet 2024 19:42:06 À : Alexandre KY Cc : user Objet : Re: Taskslots usage Hello Alexandre, Flink

使用hive的catalog问题

2024-07-15 Thread 冯奇
Flink SQL> USE CATALOG myhive; Flink SQL> CREATE TABLE mykafka (name String, age Int) WITH ( 'connector.type' = 'kafka', 'connector.version' = 'universal', 'connector.topic' = 'hive_sink', 'connector.properties.bootstrap.servers' = '10.0.15.242:9092', 'format.type' = 'csv', 'update-mode' =

Expose rocksdb options for flush thread.

2024-07-14 Thread Zhongyou Lee
Hellow everyone : Up to now, To adjuest rocksdb flush thread the only way is implement ConfigurableRocksDBOptionsFactory #setMaxBackgroundFlushes by user. I found FLINK-22059 to solve this problem. The pr has never been executed, i want to finish this pr. Can anyone assignee this pr

Re:来自kingdomad的邮件

2024-07-14 Thread 张胜军
R 发自139邮箱 The following is the content of the forwarded email From:kingdomad To:user-zh Date:2024-07-15 09:36:43 Subject:来自kingdomad的邮件 (无)

Re:Encountering scala.matchError in Flink 1.18.1 Query

2024-07-14 Thread Xuyang
Hi, this is a bug fixed in https://github.com/apache/flink/pull/25075/files#diff-4ee2dd065d2b45fb64cacd5977bec6126396cc3b56e72addfe434701ac301efeL405. You can try to join input_table and udtf first, and then use it as the input of window tvf to bypass this bug. -- Best! Xuyang

Re:来自kingdomad的邮件

2024-07-14 Thread kingdomad
-- kingdomad At 2024-07-15 09:36:43, "kingdomad" wrote: >

来自kingdomad的邮件

2024-07-14 Thread kingdomad

回复:Flink Standalone-ZK-HA模式下,CLi任务提交

2024-07-13 Thread love_h1...@126.com
猜测是两个JM同时都在向ZK的rest_service_lock节点上写入自身地址,导致Flink客户端的任务有的提交到了一个JM,另一些任务提交到了另一个JM 通过手动修改ZK节点可以复现上述情况。 无法只通过重启ZK完全复现当时的集群, 不清楚上述情况的根本原因,是否有相似BUG出现 回复的原邮件 | 发件人 | Zhanghao Chen | | 日期 | 2024年07月13日 12:41 | | 收件人 | user-zh@flink.apache.org | | 抄送至 | | | 主题 | Re: Flink

Re: Buffer Priority

2024-07-12 Thread Zhanghao Chen
Hi Enric, It basically means the prioritized buffers can bypass all non-prioritized buffers at the input gate and get processed first. You may refer to https://issues.apache.org/jira/browse/FLINK-19026 for more details where it is firstly introduced. Best, Zhanghao Chen

Re: Flink Standalone-ZK-HA模式下,CLi任务提交

2024-07-12 Thread Zhanghao Chen
从日志看,ZK 集群滚动的时候发生了切主,两个 JM 都先后成为过 Leader,但是并没有同时是 Leader。 Best, Zhanghao Chen From: love_h1...@126.com Sent: Friday, July 12, 2024 17:17 To: user-zh@flink.apache.org Subject: Flink Standalone-ZK-HA模式下,CLi任务提交 版本:Flink 1.11.6版本,Standalone HA模式,ZooKeeper 3.5.8版本

Re: Taskslots usage

2024-07-12 Thread Aleksandr Pilipenko
Hello Alexandre, Flink does not use TaskSlot per each task by default, but rather task slot will hold a slice of the entire pipeline (up to 1 subtasks of each operator, depending on the operator parallelism) [1]. So if your job parallelism is 1 - only a single task slot will be occupied. If you

  1   2   3   4   5   6   7   8   9   10   >