Re: flink run命令是否支持读取远程文件系统中的jar文件?

2021-04-22 Thread JasonLee
hi session ,per-job 模式是不支持的 application 模式是支持的 - Best Wishes JasonLee -- Sent from: http://apache-flink.147419.n8.nabble.com/

Re: Flink job消费kafka 失败,无法拿到offset值

2021-04-22 Thread Qingsheng Ren
你好 Jacob, 从错误上来看是 Kafka Consumer 没有连上 Kafka Brokers。这些方法可能帮助排查问题: 1. 确认 Flink TaskManager 和 Kafka Broker 之间的网络连通性。 2. Flink TaskManager 与 Kafka Broker 之间网络连通并不意味着能够消费数据,可能需要修改 Kafka Broker 的配置。这篇文章[1] 或许会有帮助,绝大多数 Kafka 的连接问题是由于文章中描述的配置问题导致的。 3. 配置 Log4j 将 org.apache.kafka.clients.consumer 的 Log

Re: Debezium CDC | OOM

2021-04-22 Thread Ayush Chauhan
Hi Matthias, I am using RocksDB as a state backend. I think the iceberg sink is not able to propagate back pressure to the source which is resulting in OOM for my CDC pipeline. Please refer to this - https://github.com/apache/iceberg/issues/2504 On Thu, Apr 22, 2021 at 8:44 PM Matthias Pohl

Re: 关于upsert-kafka connector的问题

2021-04-22 Thread Shengkai Fang
如果数据在upsert-kafka中已经做到了按序存储(相同key的数据放在同一个partition内),那么flink消费的时候可以做到保序。 Best, Shengkai

多个复杂算子保证精准一次性

2021-04-22 Thread Colar
您好, 我有如下代码: datastream.process(new Process1()).process(new Process2())… 这些Process可能有些复杂的计算操作 请问,如果我要保证端到端的精准一次性,我应该在所有的算子上都维护一个状态还是只在最后一个算子维护状态?

Re: Official flink java client

2021-04-22 Thread Yun Gao
Hi gaurav, Logicall Flink client is bear inside the StreamExecutionEnvironment, and users could use the StreamExecutionEnvironment to execute their jobs. Could you share more about why you want to directly use the client? Best, Yun --Original Mail --

Re: Re:回复: flink sql消费kafka join普通表为何会性能爬坡?

2021-04-22 Thread Xi Shen
我这边有使用jdbc table属性加了本地缓存 尝试把cache size设置为400/2/4,然后重启,消费kafka速度都是需要慢慢上涨 -- Sent from: http://apache-flink.147419.n8.nabble.com/

Re: Flink streaming job's taskmanager process killed by yarn nodemanager because of exceeding 'PHYSICAL' memory limit

2021-04-22 Thread 马阳阳
Hi Matthias, We have “solved” the problem by tuning the join. But I still try to answer the questions, hoping this will help. * What is the option you're referring to for the bounded shuffle? That might help to understand what streaming mode solution you're looking for. |

flink native k8s ????????

2021-04-22 Thread ??
flink 1.12.2 Native K8s, ./bin/kubernetes-session.sh \ -Dkubernetes.namespace=flink-session-cluster \ -Dkubernetes.jobmanager.service-account=flink \ -Dkubernetes.cluster-id=session001 \ -Dtaskmanager.memory.process.size=1024m \ -Dkubernetes.taskmanager.cpu=1 \

Re:?????? flink sql????kafka join??????????????????????

2021-04-22 Thread Michael Ran
?? ?? 2021-04-22 11:21:55??"" ?? >Tidb??Tidb??TiDBstructured-streaming?? >?? > > > >

Re:flink mysql cdc????

2021-04-22 Thread Michael Ran
CDCbinlog ?? 2021-04-22 14:22:18??"" <1353637...@qq.com> ?? >??flink mysql cdc >1.flink mysql

Re: [ANNOUNCE] Flink Jira Bot fully live (& Useful Filters to Work with the Bot)

2021-04-22 Thread Xintong Song
Thanks for driving this, Konstantin. Great job~! Thank you~ Xintong Song On Thu, Apr 22, 2021 at 11:57 PM Matthias Pohl wrote: > Thanks for setting this up, Konstantin. +1 > > On Thu, Apr 22, 2021 at 11:16 AM Konstantin Knauf > wrote: > >> Hi everyone, >> >> all of the Jira Bot rules are

?????? ????upsert-kafka connector??????

2021-04-22 Thread op
??upsert-kafka??key ---- ??: "user-zh"

Official flink java client

2021-04-22 Thread gaurav kulkarni
Hi,  Is there any official flink client in java that's available? I came across RestClusterClient, but I am not sure if its official. I can create my own client, but just wanted to check if there is anything official available already that I can leverage.  Thanks,Gaurav | | | | | | |

Re:请问在使用processfunction 中的processelement()和onTimer()需要考虑并发问题吗?

2021-04-22 Thread 李一飞
这两方法是同步的方式执行的,同时只能执行一个 在 2021-04-22 15:35:07,"x2009438" 写道: >如题,谢谢各位。 > > >发自我的iPhone

Re: 疑问:当开启state.backend.incremental 后 Checkpointed Data Size 会不断变大

2021-04-22 Thread HunterXHunter
没解决,我只能把它关闭了 -- Sent from: http://apache-flink.147419.n8.nabble.com/

Re: event-time window cannot become earlier than the current watermark by merging

2021-04-22 Thread Vishal Santoshi
<< Added the Fliter upfront as below, the pipe has no issues. Also metrics show that no data is being pushed through the sideoutput and that data in not pulled from the a simulated sideout ( below ) >> Added the Fliter upfront as below, the pipe has no issues. Also metrics show that no data

Re: Multiple jobs in the same Flink project

2021-04-22 Thread Arvid Heise
Hi Oğuzhan, I think you know the answer already: it's easiest to have 1 jar per application. And in most cases, it's easiest to also have 1 repo per application. You can use the same template for all 3 and all future applications without any special cases. My rule of thumb is the following: if

Re: event-time window cannot become earlier than the current watermark by merging

2021-04-22 Thread Vishal Santoshi
And when I added the filter the Exception was not thrown. So the sequence of events * Increased lateness from 12 ( that was what it was initially running with ) to 24 hours * the pipe ran as desired before it blew up with the Exception * masked the issue by increasing the lateness to 48 hours. *

Re: Question about snapshot file

2021-04-22 Thread Abdullah bin Omar
Hi, I have a savepoint or checkpointed file from my task. However, the file is binary. I want to see what the file contains. How is it possible to see what information the file has (or how it is possible to make it human readable?) Thank you On Thu, Apr 22, 2021 at 10:19 AM Matthias Pohl

Re: event-time window cannot become earlier than the current watermark by merging

2021-04-22 Thread Vishal Santoshi
Yes sir. The allowedLateNess and side output always existed. On Thu, Apr 22, 2021 at 11:47 AM Matthias Pohl wrote: > You're saying that you used `allowedLateness`/`sideOutputLateData` as > described in [1] but without the `LateEventFilter`/`LateEventSideOutput` > being added to your pipeline

Multiple jobs in the same Flink project

2021-04-22 Thread Oğuzhan Mangır
We have a flink project with multiple jobs. That means we can submit multiple job with the same jar. But there is a limitation here i think. Because, let's assume; I create a flink project with 3 jobs, and create a single jar then put it to the flink cluster (all of these steps are working on a

flink run命令是否支持读取远程文件系统中的jar文件?

2021-04-22 Thread casel.chen
flink run是否支持读取远程文件系统,例如oss://或hdfs://路径下的jar文件?看源码是需要构建PakcagedProgram,而它的构造函数中有一个File jarFile参数。不知是否能够从oss路径或hdfs路径构建出File对象。

Re: [ANNOUNCE] Flink Jira Bot fully live (& Useful Filters to Work with the Bot)

2021-04-22 Thread Matthias Pohl
Thanks for setting this up, Konstantin. +1 On Thu, Apr 22, 2021 at 11:16 AM Konstantin Knauf wrote: > Hi everyone, > > all of the Jira Bot rules are live now. Particularly in the beginning the > Jira Bot will be very active, because the rules apply to a lot of old, > stale tickets. So, if you

Re: event-time window cannot become earlier than the current watermark by merging

2021-04-22 Thread Matthias Pohl
You're saying that you used `allowedLateness`/`sideOutputLateData` as described in [1] but without the `LateEventFilter`/`LateEventSideOutput` being added to your pipeline when running into the UnsupportedOperationException issue previously? [1]

Re: event-time window cannot become earlier than the current watermark by merging

2021-04-22 Thread Vishal Santoshi
As in this is essentially doing what lateness *should* have done And I think that is a bug. My code now is . Please look at the allowedLateness on the session window. SingleOutputStreamOperator> filteredKeyedValue = keyedValue .process(new LateEventFilter(this.lateNessInMinutes*60*1000l)).name(

Re: Question about snapshot file

2021-04-22 Thread Matthias Pohl
Hi Abdullah, the metadata file contains handles to the operator states of the checkpoint [1]. You might want to have a look into the State Processor API [2]. Best, Matthias [1]

Re: Debezium CDC | OOM

2021-04-22 Thread Matthias Pohl
Hi Ayush, Which state backend have you configured [1]? Have you considered trying out RocksDB [2]? RocksDB might help with persisting at least keyed state. Best, Matthias [1] https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/state_backends.html#choose-the-right-state-backend [2]

Question about snapshot file

2021-04-22 Thread Abdullah bin Omar
Hi, (1) what 's the snapshot metadata file (binary) contains ? is it possible to read the snapshot metadata file by using Flink Deserialization? (2) is there any function that can be used to see the previous states on time of operation? Thank you

Re: Kubernetes Setup - JM as job vs JM as deployment

2021-04-22 Thread Matthias Pohl
Hi Gil, I'm not sure whether I understand you correctly. What do you mean by deploying the job manager as "job" or "deployment"? Are you referring to the different deployment modes, Flink offers [1]? These would be independent of Kubernetes. Or do you wonder what the differences are between the

Re: Long to Timestamp(3) Conversion

2021-04-22 Thread Matthias Pohl
Hi Aeden, there are some improvements to time conversions coming up in Flink 1.13. For now, the best solution to overcome this is to provide a user-defined function. Hope, that helps. Best, Matthias On Wed, Apr 21, 2021 at 9:36 PM Aeden Jameson wrote: > I've probably overlooked something

回复:Flink 1.12.0 隔几个小时Checkpoint就会失败

2021-04-22 Thread 田向阳
唉,这个问题着实让人头大,我现在还没找到原因。你这边确定了跟我说一声哈 | | 田向阳 | | 邮箱:lucas_...@163.com | 签名由 网易邮箱大师 定制 在2021年04月22日 20:56,张锴 写道: 你好,我也遇到了这个问题,你的checkpoint是怎么配置的,可以参考一下吗 Haihang Jing 于2021年3月23日周二 下午8:04写道: > 你好,问题定位到了吗? > 我也遇到了相同的问题,感觉和checkpoint interval有关 > 我有两个相同的作业(checkpoint interval >

Re: event-time window cannot become earlier than the current watermark by merging

2021-04-22 Thread Vishal Santoshi
The only thing I can think of is to add the lateness configured to the filter as in here, as in the time on the element + lateness should always be greater then the current WM. As in the current issue is Mon Apr 19 20:46:20 EDT 2021. Window end Wed Apr 21 21:09:02 EDT 2021, WM an event

Re: MemoryStateBackend Issue

2021-04-22 Thread Matthias Pohl
Hi Milind, I bet someone else might have a faster answer. But could you provide the logs and config to get a better understanding of what your issue is? In general, the state is maintained even in cases where a TaskManager fails. Best, Matthias On Thu, Apr 22, 2021 at 5:11 AM Milind Vaidya

Re: event-time window cannot become earlier than the current watermark by merging

2021-04-22 Thread Vishal Santoshi
I can do that, but I am not certain this is the right filter. Can you please validate. That aside I already have the lateness configured for the session window ( the normal withLateNess() ) and this looks like a session window was not collected and still is alive for some reason ( a flink bug ?

Re: Flink Hadoop config on docker-compose

2021-04-22 Thread Matthias Pohl
I think you're right, Flavio. I created FLINK-22414 to cover this. Thanks for bringing it up. Matthias [1] https://issues.apache.org/jira/browse/FLINK-22414 On Fri, Apr 16, 2021 at 9:32 AM Flavio Pompermaier wrote: > Hi Yang, > isn't this something to fix? If I look at the documentation at

Re: event-time window cannot become earlier than the current watermark by merging

2021-04-22 Thread Matthias Pohl
Hi Vishal, based on the error message and the behavior you described, introducing a filter for late events is the way to go - just as described in the SO thread you mentioned. Usually, you would collect late events in some kind of side output [1]. I hope that helps. Matthias [1]

Re: Flink Statefun Python Batch

2021-04-22 Thread Igal Shilman
Hi Tim, I've created a tiny PoC, let me know if this helps, I can't guarantee tho, that this is how we'll eventually approach this, but it should be somewhere along these lines. https://github.com/igalshilman/flink-statefun/tree/tim Thanks, Igal. On Thu, Apr 22, 2021 at 6:53 AM Timothy Bess

Re: Flink streaming job's taskmanager process killed by yarn nodemanager because of exceeding 'PHYSICAL' memory limit

2021-04-22 Thread dhanesh arole
Hi, Questions that @matth...@ververica.com asked are very valid and might provide more leads. But if you haven't already then it's worth trying to use jemalloc / tcmalloc. We had similar problems with slow growth in TM memory resulting in pods getting OOMed by k8s. After switching to jemalloc,

Re: event-time window cannot become earlier than the current watermark by merging

2021-04-22 Thread Vishal Santoshi
Well it was not a solution after all. We now have a session window that is stuck with the same issue albeit after the additional lateness. I had increased the lateness to 2 days and that masked the issue which again reared it's head after the 2 days ;lateness was over ( instead of the 1 day )

Re: Flink Hadoop config on docker-compose

2021-04-22 Thread Flavio Pompermaier
Great! Thanks for the support On Thu, Apr 22, 2021 at 2:57 PM Matthias Pohl wrote: > I think you're right, Flavio. I created FLINK-22414 to cover this. Thanks > for bringing it up. > > Matthias > > [1] https://issues.apache.org/jira/browse/FLINK-22414 > > On Fri, Apr 16, 2021 at 9:32 AM Flavio

Re: Flink 1.12.0 隔几个小时Checkpoint就会失败

2021-04-22 Thread 张锴
你好,我也遇到了这个问题,你的checkpoint是怎么配置的,可以参考一下吗 Haihang Jing 于2021年3月23日周二 下午8:04写道: > 你好,问题定位到了吗? > 我也遇到了相同的问题,感觉和checkpoint interval有关 > 我有两个相同的作业(checkpoint interval > 设置的是3分钟),一个运行在flink1.9,一个运行在flink1.12,1.9的作业稳定运行,1.12的运行5小时就会checkpoint > 制作失败,抛异常 org.apache.flink.util.FlinkRuntimeException:

Re: event-time window cannot become earlier than the current watermark by merging

2021-04-22 Thread Vishal Santoshi
I saw https://stackoverflow.com/questions/57334257/the-end-timestamp-of-an-event-time-window-cannot-become-earlier-than-the-current. and this seems to suggest a straight up filter, but I am not sure how does that filter works as in would it factor is the lateness when filtering ? On Thu, Apr 22,

Re: Flink streaming job's taskmanager process killed by yarn nodemanager because of exceeding 'PHYSICAL' memory limit

2021-04-22 Thread Matthias Pohl
Hi, I have a few questions about your case: * What is the option you're referring to for the bounded shuffle? That might help to understand what streaming mode solution you're looking for. * What does the job graph look like? Are you assuming that it's due to a shuffling operation? Could you

Re: 关于upsert-kafka connector的问题

2021-04-22 Thread Shengkai Fang
Hi, 请问是有什么具体的问题吗? Best, Shengkai op <520075...@qq.com> 于2021年4月22日周四 下午6:05写道: > 用 upsert-kafka connector 作为source,会有key的插入和更新出现乱序导致结果不准的问题吗? > 谢谢

????upsert-kafka connector??????

2021-04-22 Thread op
?? upsert-kafka connector source??key??

flink1.12.2,interval join并没有 inProcessingTime() and inEventTime()

2021-04-22 Thread tianxy
FLIP-134: Batch execution for the DataStream API Allow explicitly configuring time behaviour on KeyedStream.intervalJoin() FLINK-19479 Before Flink 1.12 the

flink1.12.2使用rocksdb状态后端,checkpoint size变大

2021-04-22 Thread tianxy
452 COMPLETED 103/103 2021-04-22 17:29:12 2021-04-22 17:29:12 325ms 4.40 MB 0 B (5.39 KB) 451 COMPLETED 103/103 2021-04-22 17:28:12 2021-04-22 17:28:12 122ms 4.43 MB 9.36 KB (15.2 KB) 450 COMPLETED 103/103 2021-04-22 17:27:12 2021-04-22 17:27:12 124ms

Re: 疑问:当开启state.backend.incremental 后 Checkpointed Data Size 会不断变大

2021-04-22 Thread tianxy
你好 我也遇到了 所以这个问题你知道原因了没 -- Sent from: http://apache-flink.147419.n8.nabble.com/

[ANNOUNCE] Flink Jira Bot fully live (& Useful Filters to Work with the Bot)

2021-04-22 Thread Konstantin Knauf
Hi everyone, all of the Jira Bot rules are live now. Particularly in the beginning the Jira Bot will be very active, because the rules apply to a lot of old, stale tickets. So, if you get a huge amount of emails from the Flink Jira Bot right now, this will get better. In any case, the Flink Jira

请问在使用processfunction 中的processelement()和onTimer()需要考虑并发问题吗?

2021-04-22 Thread x2009438
如题,谢谢各位。 发自我的iPhone

flink mysql cdc????

2021-04-22 Thread ????
??flink mysql cdc 1.flink mysql cdc??mysql??binlog??mysql

Re: flink 1.12.2 sql-cli 写入Hive报错 is_generic

2021-04-22 Thread Rui Li
可以发一下具体的SQL语句么(包括DDL和insert)? On Wed, Apr 21, 2021 at 5:46 PM HunterXHunter <1356469...@qq.com> wrote: > 在ddl的时候设置了 watermark。在任务页面查看watermark的时候一直没有更新watermark > > > > -- > Sent from: http://apache-flink.147419.n8.nabble.com/ > -- Best regards! Rui Li

??????flink sql cdc????kafka????????????????????

2021-04-22 Thread ????
flink-cdcSourceRecord??SourceRecord??topic?? ??Debezium mysql-conectorkafka-connectortopic?? ?? ??+??+topic??

Re: 回复: flink sql消费kafka join普通表为何会性能爬坡?

2021-04-22 Thread Xi Shen
Cache设置大小为2w,超时时间为2h 实际上整个表大小为3w左右,考虑到整个表实际只有十几兆。我会尝试cache size设置为4w,保证整个表都能装进cache里。看会不会好一点 但是我查到现在怀疑跟savepoint有关: - 如果我设置kafka offset=earliest,不带savepoint重启,flink job启动消费时,lag有5000w左右,但是1分钟内就能达到约7k/s的消费速度。如下图,job在14:31启动,前面的速度特别大是因为offset重置,但是在14:33已经达到7.5k的消费速度

Re: 回复: flink sql消费kafka join普通表为何会性能爬坡?

2021-04-22 Thread Xi Shen
读JDBC table是有缓存的,看了源码,是用Guava cache实现 文档上说,整个Task Manager进程共享使用一个Cache,所以应该和广播的效果是一样的?所以应该不是查询TiDB导致的性能问题 -- Sent from: http://apache-flink.147419.n8.nabble.com/

Re:回复:flink sql cdc发到kafka消息表名信息缺失问题

2021-04-22 Thread casel.chen
我的疑问正是flink cdc集成debezium后为何会把原始信息弄丢失了?直接采用原生的debezium或者canal同步数据固然可以。但如果flink cdc直接能发出来的话不就可以节省这些组件和运维么?flink cdc设计的初衷也是如此。 在 2021-04-22 11:01:22,"飞翔" 写道: 既然这样,为何要用flink去同步信息,把信息的原始信息都丢失了。你可以直接采用原生的debezium或者canal同步数据,发送kafka, 比如canal的样例,虽然after

Re: Multiple select queries in single job on flink table API

2021-04-22 Thread tbud
/"TableResult result1 = stmtSet.execute(); result1.print();"/ I tried this, and the result is following : Job has been submitted with JobID 4803aa5edc31b3ddc884f922008c5c03 +++ |

Re:Re: flink流、批场景下kafka拉取速率问题:每批次拉取多少条?是动态吗还是可以设置

2021-04-22 Thread 李一飞
明白了,谢谢~ 在 2021-04-21 19:58:23,"Peihui He" 写道: >fetch.min.bytes >fetch.wait.max.ms >还可以用着两个参数控制下的 > >熊云昆 于2021年4月21日周三 下午7:10写道: > >> 有个这个参数max.poll.records,表示一次最多获取多少条record数据,默认是500条 >> >> >> | | >> 熊云昆 >> | >> | >> 邮箱:xiongyun...@163.com >> | >> >> 签名由 网易邮箱大师 定制 >> >> 在2021年04月20日 18:19,李一飞

Re:回复:flink流、批场景下kafka拉取速率问题:每批次拉取多少条?是动态吗还是可以设置

2021-04-22 Thread 李一飞
谢谢 在 2021-04-21 19:10:17,"熊云昆" 写道: >有个这个参数max.poll.records,表示一次最多获取多少条record数据,默认是500条 > > >| | >熊云昆 >| >| >邮箱:xiongyun...@163.com >| > >签名由 网易邮箱大师 定制 > >在2021年04月20日 18:19,李一飞 写道: >flink流、批场景下kafka拉取速率问题:每批次拉取多少条?是动态吗还是可以设置 >最好分流、批场景回答一下,谢谢!