flink cdc metrics 问题

2024-04-07 Thread casel.chen
请问flink cdc对外有暴露一些监控metrics么? 我希望能够监控到使用flink cdc的实时作业当前未消费的binlog数据条数,类似于kafka topic消费积压监控。 想通过这个监控防止flink cdc实时作业消费慢而被套圈(最大binlog条数如何获取?)

Re: Impact on using clean code and serializing everything

2024-04-07 Thread Biao Geng
Hi Oscar, I assume the "dependency" in your description refers to the custom fields in the ProcessFunction's implementation. You are right that as the ProcessFunction inherits `Serializable` interface so we should make all fields either serializable or transient. As for performance, I have no

Re: 退订

2024-04-07 Thread Biao Geng
Hi, If you want to unsubscribe to user-zh mailing list, please send an email with any content to user-zh-unsubscr...@flink.apache.org . 退订请发送任意内容的邮件到 user-zh-unsubscr...@flink.apache.org . Best, Biao Geng 995626544 <995626...@qq.com.invalid> 于2024年4月7日周日 16:06写道: > 退订 > > > > > 995626544 >

Re: How to enable RocksDB native metrics?

2024-04-07 Thread Biao Geng
Hi Lei, You can use the "-D" option in the command line to set configs for a specific job. E.g, `flink run-application -t yarn-application -Djobmanager.memory.process.size=1024m `. See https://nightlies.apache.org/flink/flink-docs-release-1.19/docs/deployment/cli/ for more details. Best,

Re: How to enable RocksDB native metrics?

2024-04-07 Thread Lei Wang
I can enable them by adding to flink-conf.yaml, it will work. However, I don't want to edit the flink-conf.yaml file, I want to enable the configurations when submitting a job on cmd line, then it only works for the job I submitted, I have no idea how to do this? Thanks, Lei On Mon, Apr 8, 2024

Re: How to enable RocksDB native metrics?

2024-04-07 Thread Marco Villalobos
Hi Lei, Have you tried enabling these Flink configuration properties?Configurationnightlies.apache.orgSent from my iPhoneOn Apr 7, 2024, at 6:03 PM, Lei Wang wrote:I  want to enable it only for specified jobs, how can I specify the   configurations on  cmd line when submitting a job?Thanks,LeiOn

Re: How to enable RocksDB native metrics?

2024-04-07 Thread Lei Wang
I want to enable it only for specified jobs, how can I specify the configurations on cmd line when submitting a job? Thanks, Lei On Sun, Apr 7, 2024 at 4:59 PM Zakelly Lan wrote: > Hi Lei, > > You can enable it by some configurations listed in: >

Re: Debugging Kryo Fallback

2024-04-07 Thread Salva Alcántara
Thanks Yunfeng! That is more or less what I do now when I run into the problem. This approach reports problems one at a time (an exception is raised on the first problem encountered). Instead of that, I think accumulating all the issues and presenting them all at once would be more user friendly.

Re: How to enable RocksDB native metrics?

2024-04-07 Thread Zakelly Lan
Hi Lei, You can enable it by some configurations listed in: https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/deployment/config/#rocksdb-native-metrics (RocksDB Native Metrics) Best, Zakelly On Sun, Apr 7, 2024 at 4:59 PM Zakelly Lan wrote: > Hi Lei, > > You can enable it by

Re: How to enable RocksDB native metrics?

2024-04-07 Thread zbz-163
You can take a look at the document. [ https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/deployment/config/#rocksdb-native-metrics ] Thanks, Zbz > 2024年4月7日 13:41,Lei Wang 写道: > > > Using big state and want to do some performance tuning, how can i enable > RocksDB native

How to enable RocksDB native metrics?

2024-04-06 Thread Lei Wang
Using big state and want to do some performance tuning, how can i enable RocksDB native metrics? I am using Flink 1.14.4 Thanks, Lei

Re: Combining multiple stages into a multi-stage processing pipeline

2024-04-06 Thread Yunfeng Zhou
Hi Mark, IMHO, your design of the Flink application is generally feasible. In Flink ML, I have once met a similar design in ChiSqTest operator, where the input data is first aggregated to generate some results and then broadcast and connected with other result streams from the same input

Re: HBase SQL连接器为啥不支持ARRAY/MAP/ROW类型

2024-04-06 Thread Yunfeng Zhou
应该是由于这些复杂集合在HBase中没有一个直接与之对应的数据类型,所以Flink SQL没有直接支持的。 一种思路是把这些数据类型按照某种格式(比如json)转换成字符串/序列化成byte array,把字符串存到HBase中,读取出来的时候也再解析/反序列化。 On Mon, Apr 1, 2024 at 7:38 PM 王广邦 wrote: > > HBase SQL 连接器(flink-connector-hbase_2.11) 为啥不支持数据类型:ARRAY、MAP / MULTISET、ROW > 不支持? >

Combining multiple stages into a multi-stage processing pipeline

2024-04-06 Thread Mark Petronic
I am looking for some design advice for a new Flink application and I am relatively new to Flink - I have one, fairly straightforward Flink application in production so far. For this new application, I want to create a three-stage processing pipeline. Functionally, I am seeing this as ONE long

RE: Rabbitmq connector for Flink v1.18

2024-04-05 Thread Charlotta Westberg via user
Hi, Sorry to be a bother. Are there any updates on this? I was unable to find a 3.0.2(-rc1) here: https://mvnrepository.com/artifact/org.apache.flink/flink-connector-rabbitmq Can we use it somehow already without building it ourselves?  Best regards Charlotta -Original Message- From:

Re: IcebergSourceReader metrics

2024-04-04 Thread Chetas Joshi
Hi Péter, Yes, this is exactly what I was looking for. Thanks! Chetas On Thu, Mar 28, 2024 at 11:19 PM Péter Váry wrote: > Hi Chetas, > Are you looking for this information? > > * public IcebergSourceReaderMetrics(MetricGroup metrics, String > fullTableName) {* > *MetricGroup readerMetrics

Impact on using clean code and serializing everything

2024-04-04 Thread Oscar Perez via user
Hi, We would like to adhere to clean code and expose all dependencies in the constructor of the process functions In flink, however, all dependencies passed to process functions must be serializable. Another workaround is to instantiate these dependencies in the open method of the process

Re: How to list operators and see UID

2024-04-03 Thread Asimansu Bera
Hello Oscar, You can use Rest API to fetch the Vertices Id which I felt is mapped to operator ID( I guess so). http://localhost:8081/jobs/55770be021a8887278234d97684b9518/ You need to provide the jobid which will provide you list of vertices for the job graph: "vertices": [ { "id":

How to list operators and see UID

2024-04-03 Thread Oscar Perez via user
Hei, We are facing an issue with one of the jobs in production where fails to map state from one deployment to another. I guess the problem is that we failed to set a UID and relies on the default of providing one based on hash Is it possible to see all operators / UIDs at a glance? What is the

Participate in the ASF 25th Anniversary Campaign

2024-04-03 Thread Brian Proffitt
Hi everyone, As part of The ASF’s 25th anniversary campaign[1], we will be celebrating projects and communities in multiple ways. We invite all projects and contributors to participate in the following ways: * Individuals - submit your first contribution:

Data duplicated with s3 file sink

2024-04-03 Thread Vararu, Vadim
Hi all, I’ve got a Flink job that uses Kinesis as source and S3 files as Sink. The sink rolls at checkpoints and the checkpointing itself is configured as EXACTLY_ONCE. While running, everything looks good and a new bunch of files appear on s3 each minute (checkpoint is each 60s). The problem

Re:Execute Python UDF in Java Flink application

2024-04-03 Thread Xuyang
Hi, Tony. I think it's easy for users to use python udf with java. You can find more details here[1][2]. [1] https://flink.apache.org/2020/04/09/pyflink-introducing-python-support-for-udfs-in-flinks-table-api/ [2]

Re: Debugging Kryo Fallback

2024-04-02 Thread Salva Alcántara
FYI Reposted in SO: - https://stackoverflow.com/questions/78265380/how-to-debug-the-kryo-fallback-in-flink On Thu, Mar 28, 2024 at 7:24 AM Salva Alcántara wrote: > I wonder which is the simplest way of troubleshooting/debugging what > causes the Kryo fallback. > > Detecting it is just a matter

Re: join two streams with pyflink

2024-04-02 Thread Biao Geng
Hi Thierry, Your case is not very complex and I believe all programming language(e.g. Java, Python, SQL) interfaces of flink can do that. When using pyflink, you can use pyflink datastream/table/SQL API. Here are some examples of using pyflink table api:

Re: How to handle tuple keys with null values

2024-04-02 Thread Hang Ruan
Hi Sachin. I think maybe we could cast the Long as String to handle the null value. Or as Asimansu said, try to filter out the null data. Best, Hang Asimansu Bera 于2024年4月3日周三 08:35写道: > Hello Sachin, > > The same issue had been reported in the past and JIRA was closed without > resolution. >

Execute Python UDF in Java Flink application

2024-04-02 Thread Zhou, Tony
Hi everyone, Out of curiosity, I have a high level question with Flink: I have a use case where I want to define some UDFs in Python while have the main logics written in Java. I am wondering how complex it is going to be with this design choice, or even if it is possible with Flink. Thanks,

Re: GCS FileSink Read Timeouts

2024-04-02 Thread Asimansu Bera
Hello Dylan, I'm not an expert. There are many configuration settings(tuning) which could be setup via flink configuration. Pls refer to the second link below - specifically retry options. https://nightlies.apache.org/flink/flink-docs-release-1.19/docs/deployment/filesystems/gcs/

Re: How to handle tuple keys with null values

2024-04-02 Thread Asimansu Bera
Hello Sachin, The same issue had been reported in the past and JIRA was closed without resolution. https://issues.apache.org/jira/browse/FLINK-4823 I do see this is as a data quality issue. You need to understand what you would like to do with the null value. Either way, better to filter out

GCS FileSink Read Timeouts

2024-04-02 Thread Dylan Fontana via user
Hey Flink Users, We've been facing an issue with GCS that I'm keen to hear the community's thoughts or insights on. We're using the GCS FileSystem on a FileSink to write parquets in our Flink app. We're finding sporadic instances of `com.google.cloud.storage.StorageException: Read timed out`

Re: Understanding checkpoint/savepoint storage requirements

2024-04-02 Thread Robert Young
Thank you both for the information! Rob On Thu, Mar 28, 2024 at 7:08 PM Asimansu Bera wrote: > To add more details to it so that it will be clear why access to > persistent object stores for all JVM processes are required for a job graph > of Flink for consistent recovery. > *JoB Manager:* > >

How to handle tuple keys with null values

2024-04-02 Thread Sachin Mittal
Hello folks, I am keying my stream using a Tuple: example: public class MyKeySelector implements KeySelector> { @Override public Tuple2 getKey(Data data) { return Tuple2.of(data.id, data.id1); } } Now id1 can have null values. In this case how should I handle this? Right now I am getting

join two streams with pyflink

2024-04-02 Thread Fokou Toukam, Thierry
Hi, i have 2 streams as sean in this example (6> {"tripId": "275118740", "timestamp": "2024-04-02T06:20:00Z", "stopSequence": 13, "stopId": "61261", "bearing": 0.0, "speed": 0.0, "vehicleId": "39006", "routeId": "747"} 1> {"visibility": 1, "weather_conditions": "clear sky", "timestamp":

Re: 配置hadoop依赖问题

2024-04-01 Thread Biao Geng
Hi fengqi, “Hadoop is not in the classpath/dependencies.”报错说明org.apache.hadoop.conf.Configuration和org.apache.hadoop.fs.FileSystem这些hdfs所需的类没有找到。 如果你的系统环境中有hadoop的话,通常是用这种方式来设置classpath: export HADOOP_CLASSPATH=`hadoop classpath`

Re: Handle exception for an empty Datastream

2024-04-01 Thread Biao Geng
Hi Nida, The StreamExecutionEnvironment#fromCollection(java.util.Collection data) method will check if the input collection is empty and throw the exception you have met if it is. The 'list != null' cannot get rid of the exception but the ' !list.isEmpty() ' should do the trick. Could you please

Re: 退订

2024-04-01 Thread Biao Geng
Hi, 退订请发送任意内容的邮件到 user-zh-unsubscr...@flink.apache.org . Best, Biao Geng CloudFunny 于2024年3月31日周日 22:25写道: > >

Re: 退订

2024-04-01 Thread Biao Geng
Hi, 退订请发送任意内容的邮件到 user-zh-unsubscr...@flink.apache.org . Best, Biao Geng 戴少 于2024年4月1日周一 11:09写道: > 退订 > > -- > > Best Regards, > > > > > 回复的原邮件 > | 发件人 | wangfengyang | > | 发送日期 | 2024年03月22日 17:28 | > | 收件人 | user-zh | > | 主题 | 退订 | > 退订

Re: 退订

2024-04-01 Thread Biao Geng
Hi, 退订请发送任意内容的邮件到 user-zh-unsubscr...@flink.apache.org . Best, Biao Geng 杨东树 于2024年3月31日周日 20:23写道: > 申请退订邮件通知,谢谢!

Re: 申请退订邮件申请,谢谢

2024-04-01 Thread Biao Geng
Hi, 退订请发送任意内容的邮件到 user-zh-unsubscr...@flink.apache.org . Best, Biao Geng 于2024年3月31日周日 22:20写道: > 申请退订邮件申请,谢谢

Handle exception for an empty Datastream

2024-04-01 Thread Fidea Lidea
Hi Team, I have written a Flink Job which reads data in a List & then converts it to stream. *Example*: public static void main(String[] args) throws Exception { // set up execution environment StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); . . .

退订

2024-04-01 Thread 薛礼彬
退订

Re: Flink pipeline throughput

2024-04-01 Thread Asimansu Bera
Hello Kartik, For your case, if events ingested/Second is 300/60=5 and payload size is 2kb , per second, ingestion size 5*2k=10kb. Network buffer size is 32kb by default. You can also decrease the value to 16k.

Re: Flink pipeline throughput

2024-04-01 Thread Kartik Kushwaha
Thank you. I will check and get back on both the sugesstions made by Asimansu and Xuyang. I am using Flink 1.17.0 Regards, Kartik On Mon, Apr 1, 2024, 5:13 AM Asimansu Bera wrote: > Hello Karthik, > > You may check the execution-buffer-timeout-interval parameter. This value > is an important

Re: Re: Optimize exact deduplication for tens of billions data per day

2024-04-01 Thread Jeyhun Karimov
Hi Lei, In addition to the valuable suggested options above, maybe you can try to optimize your partitioning function (since you know your data). Maybe sample [subset of] your data if possible and/or check the key distribution, before re-defining your partitioning function. Regards, Jeyhun On

Re: Flink pipeline throughput

2024-04-01 Thread Asimansu Bera
Hello Karthik, You may check the execution-buffer-timeout-interval parameter. This value is an important one for your case. I had a similar issue experienced in the past. https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/deployment/config/#execution-buffer-timeout-interval For

Re: 回复:退订

2024-03-31 Thread Zhanghao Chen
请发送任意内容的邮件到 user-zh-unsubscr...@flink.apache.org 地址来取消订阅。你可以参考[1] 来管理你的邮件订阅。 [1] https://flink.apache.org/zh/community/#%e9%82%ae%e4%bb%b6%e5%88%97%e8%a1%a8 Best, Zhanghao Chen From: 戴少 Sent: Monday, April 1, 2024 11:10 To: user-zh Cc:

Re: 退订

2024-03-31 Thread Zhanghao Chen
请发送任意内容的邮件到 user-zh-unsubscr...@flink.apache.org 地址来取消订阅。你可以参考[1] 来管理你的邮件订阅。 [1] https://flink.apache.org/zh/community/#%e9%82%ae%e4%bb%b6%e5%88%97%e8%a1%a8 Best, Zhanghao Chen From: zjw Sent: Monday, April 1, 2024 11:05 To: user-zh@flink.apache.org Subject: 退订

Re: Re:Re: Re: 1.19自定义数据源

2024-03-31 Thread Zhanghao Chen
请发送任意内容的邮件到 user-zh-unsubscr...@flink.apache.org 地址来取消订阅。你可以参考[1] 来管理你的邮件订阅。 [1] https://flink.apache.org/zh/community/#%e9%82%ae%e4%bb%b6%e5%88%97%e8%a1%a8 Best, Zhanghao Chen From: 熊柱 <18428358...@163.com> Sent: Monday, April 1, 2024 11:14 To:

退订

2024-03-31 Thread 杨作青
退订

Re:Re: Re: 1.19自定义数据源

2024-03-31 Thread 熊柱
退订 在 2024-03-28 19:56:06,"Zhanghao Chen" 写道: >如果是用的 DataStream API 的话,也可以看下新增的 DataGen Connector [1] 是否能直接满足你的测试数据生成需求。 > > >[1] >https://nightlies.apache.org/flink/flink-docs-release-1.19/docs/connectors/datastream/datagen/ > >Best, >Zhanghao Chen

回复:退订

2024-03-31 Thread 戴少
退订 -- Best Regards, 回复的原邮件 | 发件人 | 李一飞 | | 发送日期 | 2024年03月14日 00:09 | | 收件人 | user-zh-sc.1618840368.ibekedaekejgeemingfn-kurisu_li=163.com, user-zh-subscribe , user-zh | | 主题 | 退订 | 退订

??????????

2024-03-31 Thread ????
-- Best Regards, | ?? | wangfengyang | | | 2024??03??22?? 17:28 | | ?? | user-zh | | | |

退订

2024-03-31 Thread zjw

Re:Re: Optimize exact deduplication for tens of billions data per day

2024-03-31 Thread Xuyang
Hi, Wang. What about just increasing the parallemism to reduce the number of keys processed per parallelism? Is the distribution of keys uneven? If so, you can use the datastream api to manually implement some optimization methods of flink sql.[1] [1]

Re:Flink pipeline throughput

2024-03-31 Thread Xuyang
Hi, Kartik. On flink ui, is there any operator that maintains a relatively high busy? Could you also try using a flame graph to provide more information?[1] [1] https://nightlies.apache.org/flink/flink-docs-master/docs/ops/debugging/flame_graphs/ -- Best! Xuyang At 2024-03-30

退订

2024-03-31 Thread CloudFunny

申请退订邮件申请,谢谢

2024-03-31 Thread wangwj03
申请退订邮件申请,谢谢

退订

2024-03-31 Thread 杨东树
申请退订邮件通知,谢谢!

Re: flink version stable

2024-03-30 Thread Lincoln Lee
Hi Thierry, The flink connectors have been separated from the main flink repository[1], using separate repositories and release process[2]. For example, https://github.com/apache/flink-connector-kafka for the Kafka connector, and its latest release is v3.1.0[3]. You can follow new releases of

Flink pipeline throughput

2024-03-30 Thread Kartik Kushwaha
Hello, I have a Streaming event processing job that looks like this. *Source - ProcessFn(3 in total) - Sink* I am seeing a delay of 50ms to 250ms between each operators (maybe buffering or serde delays) leading to a slow end- to-end processing. What could be the reason for such high latency?

Re: Optimize exact deduplication for tens of billions data per day

2024-03-29 Thread Lei Wang
Perhaps I can keyBy(Hash(originalKey) % 10) Then in the KeyProcessOperator using MapState instead of ValueState MapState mapState There's about 10 OriginalKey for each mapState Hope this will help On Fri, Mar 29, 2024 at 9:24 PM Péter Váry wrote: > Hi Lei, > > Have you tried to

Re: flink version stable

2024-03-29 Thread Fokou Toukam, Thierry
I’m asking because I am seeing that the latest version don’t have all libraries such as Kafka connector Thierry FOKOU | IT M.A.Sc student Département de génie logiciel et TI École de technologie supérieure | Université du Québec 1100, rue Notre-Dame Ouest Montréal (Québec) H3C 1K3

Re: Optimize exact deduplication for tens of billions data per day

2024-03-29 Thread Péter Váry
Hi Lei, Have you tried to make the key smaller, and store a list of found keys as a value? Let's make the operator key a hash of your original key, and store a list of the full keys in the state. You can play with your hash length to achieve the optimal number of keys. I hope this helps, Peter

Re: flink version stable

2024-03-29 Thread Junrui Lee
Hi, The latest stable version of FLINK is 1.19.0 > > Fokou Toukam, Thierry > 于2024年3月29日周五 16:25写道: > >> Hi, just want to know which version of flink is stable? >> >> *Thierry FOKOU *| * IT M.A.Sc Student* >> >> Département de génie logiciel et TI >> >> École de technologie

Re: Batch Job with Adaptive Batch Scheduler failing with JobInitializationException: Could not start the JobMaster

2024-03-29 Thread Junrui Lee
Hi Dipak, Regarding question 1, I noticed from the logs that the method createBatchExecutionEnvironment from Beam is being used in your job. IIUC, this method utilizes Flink's DataSet API. If indeed the DataSet API is being used, the configuration option execution.batch-shuffle-mode will not take

Row to tuple conversion in PyFlink when switching to 'thread' execution mode

2024-03-29 Thread Wouter Zorgdrager
Dear readers, I'm running into some unexpected behaviour in PyFlink when switching execution mode from process to thread. In thread mode, my `Row` gets converted to a tuple whenever I use a UDF in a map operation. By this conversion to tuples, we lose critical information such as column names.

Batch Job with Adaptive Batch Scheduler failing with JobInitializationException: Could not start the JobMaster

2024-03-29 Thread Dipak Tandel
Hi Everyone I am facing some issues while running the batch job on a Flink cluster using Adaptive Batch Scheduler. I have deployed a flink cluster on Kubernetes using the flink Kubernetes operator and submitted a job to the cluster using Apache beam FlinkRunner. I am using Flink version 1.16.

Optimize exact deduplication for tens of billions data per day

2024-03-29 Thread Lei Wang
Use RocksDBBackend to store whether the element appeared within the last one day, here is the code: *public class DedupFunction extends KeyedProcessFunction {* *private ValueState isExist;* *public void open(Configuration parameters) throws Exception {* *ValueStateDescriptor

Re: One query just for curiosity

2024-03-29 Thread gongzhongqiang
Hi Ganesh, As Zhanghao Chen told before, He advise you two solutions for different scenarios. 1.Process record is a CPU-bound task: scale up parallelism of task and flink cluster to improve tps. 2.Process record is a IO-bound task: use Async-IO to reduce cost of resource and alse get better

Re: IcebergSourceReader metrics

2024-03-29 Thread Péter Váry
Hi Chetas, Are you looking for this information? * public IcebergSourceReaderMetrics(MetricGroup metrics, String fullTableName) {* *MetricGroup readerMetrics =* *metrics.addGroup("IcebergSourceReader").addGroup("table", fullTableName);* *this.assignedSplits =

flink version stable

2024-03-28 Thread Fokou Toukam, Thierry
Hi, just want to know which version of flink is stable? Thierry FOKOU | IT M.A.Sc Student Département de génie logiciel et TI École de technologie supérieure | Université du Québec 1100, rue Notre-Dame Ouest Montréal (Québec) H3C 1K3 Tél +1 (438) 336-9007 [image001]

Re: Flink cache support

2024-03-28 Thread Marco Villalobos
Zhanghao is correct. You can use what is called "keyed state". It's like a cache. https://nightlies.apache.org/flink/flink-docs-release-1.19/docs/dev/datastream/fault-tolerance/state/ > On Mar 28, 2024, at 7:48 PM, Zhanghao Chen wrote: > > Hi, > > You can maintain a cache manually in

Re: One query just for curiosity

2024-03-28 Thread Zhanghao Chen
Yes. However, a huge parallelism would require additional coordination cost so you might need to set up the JobManager with a decent spec (at least 8C 16G by experience). Also, you'll need to make sure there's no external bottlenecks (e.g. reading/writing data from the external storage). Best,

Re: need flink support framework for dependency injection

2024-03-28 Thread Ruibin Xing
Hi Thais, Thanks, that's really detailed and inspiring! I think we can use the same pattern for states too. On Wed, Mar 27, 2024 at 6:40 PM Schwalbe Matthias < matthias.schwa...@viseca.ch> wrote: > Hi Ruibin, > > > > Our code [1] targets a very old version of Flink 1.8, for current >

Re: Flink cache support

2024-03-28 Thread Zhanghao Chen
Hi, You can maintain a cache manually in your operator implementations. You can load the initial cached data on the operator open() method before the processing starts. However, this would set up a cache per task instance. If you'd like to have a cache shared by all processing tasks without

Re: One query just for curiosity

2024-03-28 Thread Ganesh Walse
You mean to say we can process 32767 records in parallel. And may I know if this is the case then do we need to do anything for this. On Fri, 29 Mar 2024 at 8:08 AM, Zhanghao Chen wrote: > Flink can be scaled up to a parallelism of 32767 at max. And if your > record processing is mostly

Re: One query just for curiosity

2024-03-28 Thread Zhanghao Chen
Flink can be scaled up to a parallelism of 32767 at max. And if your record processing is mostly IO-bound, you can further boost the throughput via Async-IO [1]. [1] https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/operators/asyncio/ Best, Zhanghao Chen

IcebergSourceReader metrics

2024-03-28 Thread Chetas Joshi
Hello, I am using Flink to read Iceberg (S3). I have enabled all the metrics scopes in my FlinkDeployment as below metrics.scope.jm: flink.jobmanager metrics.scope.jm.job: flink.jobmanager.job metrics.scope.tm: flink.taskmanager metrics.scope.tm.job: flink.taskmanager.job metrics.scope.task:

One query just for curiosity

2024-03-28 Thread Ganesh Walse
Hi Team, If my 1 record gets processed in 1 second in a flink. Then what will be the best time taken to process 1000 records in flink using maximum parallelism.

Flink cache support

2024-03-28 Thread Ganesh Walse
Hi Team, In my project my requirement is to cache data from the oracle database where the number of tables are more and the same data will be required for all the transactions to process. Can you please suggest the approach where cache should be 1st loaded in memory then stream processing should

Re: [ANNOUNCE] Apache Paimon is graduated to Top Level Project

2024-03-28 Thread Yanfei Lei
Congratulations! Best, Yanfei Zhanghao Chen 于2024年3月28日周四 19:59写道: > > Congratulations! > > Best, > Zhanghao Chen > > From: Yu Li > Sent: Thursday, March 28, 2024 15:55 > To: d...@paimon.apache.org > Cc: dev ; user > Subject: Re: [ANNOUNCE] Apache Paimon is

Re: [ANNOUNCE] Apache Paimon is graduated to Top Level Project

2024-03-28 Thread Zhanghao Chen
Congratulations! Best, Zhanghao Chen From: Yu Li Sent: Thursday, March 28, 2024 15:55 To: d...@paimon.apache.org Cc: dev ; user Subject: Re: [ANNOUNCE] Apache Paimon is graduated to Top Level Project CC the Flink user and dev mailing list. Paimon originated

Re: Re: 1.19自定义数据源

2024-03-28 Thread Zhanghao Chen
如果是用的 DataStream API 的话,也可以看下新增的 DataGen Connector [1] 是否能直接满足你的测试数据生成需求。 [1] https://nightlies.apache.org/flink/flink-docs-release-1.19/docs/connectors/datastream/datagen/ Best, Zhanghao Chen From: ha.fen...@aisino.com Sent: Thursday, March 28, 2024 15:34 To:

Re: [ANNOUNCE] Apache Paimon is graduated to Top Level Project

2024-03-28 Thread gongzhongqiang
Congratulations! Best, Zhongqiang Gong Yu Li 于2024年3月28日周四 15:57写道: > CC the Flink user and dev mailing list. > > Paimon originated within the Flink community, initially known as Flink > Table Store, and all our incubating mentors are members of the Flink > Project Management Committee. I am

Re: Re: 1.19自定义数据源

2024-03-28 Thread Shawn Huang
你好,关于如何实现source接口可以参考以下资料: [1] FLIP-27: Refactor Source Interface - Apache Flink - Apache Software Foundation [2] 如何高效接入 Flink:Connecter / Catalog API 核心设计与社区进展 (qq.com)

Re: [ANNOUNCE] Apache Paimon is graduated to Top Level Project

2024-03-28 Thread Rui Fan
Congratulations~ Best, Rui On Thu, Mar 28, 2024 at 3:55 PM Yu Li wrote: > CC the Flink user and dev mailing list. > > Paimon originated within the Flink community, initially known as Flink > Table Store, and all our incubating mentors are members of the Flink > Project Management Committee. I

Re: [ANNOUNCE] Apache Paimon is graduated to Top Level Project

2024-03-28 Thread Yu Li
CC the Flink user and dev mailing list. Paimon originated within the Flink community, initially known as Flink Table Store, and all our incubating mentors are members of the Flink Project Management Committee. I am confident that the bonds of enduring friendship and close collaboration will

Re: 1.19自定义数据源

2024-03-28 Thread gongzhongqiang
你好: 当前 flink 1.19 版本只是标识为过时,在未来版本会移除 SourceFunction。所以对于你的应用而言为了支持长期 flink 版本考虑,可以将这些SourceFunction用Source重新实现。 ha.fen...@aisino.com 于2024年3月28日周四 14:18写道: > > 原来是继承SourceFunction实现一些简单的自动生成数据的方法,在1.19中已经标识为过期,好像是使用Source接口,这个和原来的SourceFunction完全不同,应该怎么样生成测试使用的自定义数据源呢? >

How to define the termination criteria for iterations in Flink ML?

2024-03-28 Thread Komal M
Hi all, I have another question regarding Flink ML’s iterations. In the documentation it says “The iterative algorithm has an iteration body that is repeatedly invoked until some termination criteria is reached (e.g. after a user-specified number of epochs has been reached).” My question is

Debugging Kryo Fallback

2024-03-28 Thread Salva Alcántara
I wonder which is the simplest way of troubleshooting/debugging what causes the Kryo fallback. Detecting it is just a matter of adding this line to your job: ``` env.getConfig().disableGenericTypes(); ``` or in more recent versions: ``` pipeline.generic-types: false ``` But once you detect

Re: Understanding checkpoint/savepoint storage requirements

2024-03-28 Thread Asimansu Bera
To add more details to it so that it will be clear why access to persistent object stores for all JVM processes are required for a job graph of Flink for consistent recovery. *JoB Manager:* Flink's JobManager writes critical metadata during checkpoints for fault tolerance: - Job

Re: Flink job unable to restore from savepoint

2024-03-27 Thread prashant parbhane
flink version 1.17 Didn't change any job configuration. We are facing this below issue. https://issues.apache.org/jira/browse/FLINK-23886 On Wed, Mar 27, 2024 at 1:39 AM Hangxiang Yu wrote: > Hi, Prashant. > Which Flink version did you use? > And Did you modify your job logic or configurations

Re:Understanding checkpoint/savepoint storage requirements

2024-03-27 Thread Feifan Wang
Hi Robert : Your understanding are right ! Add some more information : JobManager not only responsible for cleaning old checkpoints, but also needs to write metadata file to checkpoint storage after all taskmanagers have taken snapshots. --- Best Feifan Wang

Re:[discuss] [jdbc] Support Ignore deleting is required?

2024-03-27 Thread Xuyang
Hi, ouywl. IMO, +1 for this option. You can start a discussion on the dev mailing list[1] to seek more input from more community developers. [1] d...@flink.apache.org -- Best! Xuyang At 2024-03-27 11:28:37, "ou...@139.com" wrote: When using the jdbc sink connector, there are a

Understanding checkpoint/savepoint storage requirements

2024-03-27 Thread Robert Young
Hi all, I have some questions about checkpoint and savepoint storage. >From what I understand a distributed, production-quality job with a lot of state should use durable shared storage for checkpoints and savepoints. All job managers and task managers should access the same volume. So typically

Re: End-to-end lag spikes when closing a large number of panes

2024-03-27 Thread Robert Metzger
Hey Caio, Your analysis of the problem sounds right to me, I don't have a good solution for you :( I’ve validated that CPU profiles show clearAllState using a significant > amount of CPU. Did you use something like async-profiler here? Do you have more info on the breakdown into what used the

Re: Discussion thread : Proposal to add Conditions in Flink CRD's Status field

2024-03-27 Thread Ahmed Hamdy
I am sorry it should be "d...@flink.apache.org" Best Regards Ahmed Hamdy On Wed, 27 Mar 2024 at 13:00, Ahmed Hamdy wrote: > Hi Lajith, > Could you please open the discussion thread against "d...@apache.flink.org", > I believe it is better suited there. > Best Regards > Ahmed Hamdy > > > On

Re: Discussion thread : Proposal to add Conditions in Flink CRD's Status field

2024-03-27 Thread Ahmed Hamdy
Hi Lajith, Could you please open the discussion thread against "d...@apache.flink.org", I believe it is better suited there. Best Regards Ahmed Hamdy On Wed, 27 Mar 2024 at 05:33, Lajith Koova wrote: >  > > Hello, > > > Starting discussion thread here to discuss a proposal to add Conditions >

RE: need flink support framework for dependency injection

2024-03-27 Thread Schwalbe Matthias
Hi Ruibin, Our code [1] targets a very old version of Flink 1.8, for current development my employer didn’t decide (yet?) to contribute it to the public. That old code does not yet contain the abstractions for setup of state primitive, so let me sketch it here: * Derive a specific

Error when using FlinkML iterations with KeyedCoProcessFunction

2024-03-27 Thread Komal M
Hi, As the DataStream API's iterativeStream method has been deprecated for future flink releases, the documentation recommend’s using Flink ML's iteration as an alternative. I am trying to build my understanding of the new iterations API as it will be a requirement for our future projects. As

Re: need flink support framework for dependency injection

2024-03-27 Thread Ruibin Xing
Hi Thias, Could you share your approach to job setup using Spring, if that's possible? We also use Spring Boot for DI in jobs, primarily relying on profiles. I'm particularly interested in how you use the same job structures for different scenarios, such as reading savepoints. Thank you very

<    2   3   4   5   6   7   8   9   10   11   >