Re: 1.15.2作业频繁(每 几十分钟 ~ 1小时)报 LocalTransportException: readAddress(..) failed: Connection timed out .

2022-12-07 Thread yidan zhao
目前感觉和 https://issues.apache.org/jira/browse/FLINK-19249 和 https://issues.apache.org/jira/browse/FLINK-16030 有点类似。网络环境不稳定。相同配置在物理机没问题。 yidan zhao 于2022年12月7日周三 16:11写道: > > 谢谢,不过这几个参数和netty关系不大吧。 > heartbeat和akka的可能会和rpc超时有关,不过我这个是netty的报错,不是rpc部分。 > web和rest应该是和client提交任务有关。 > > Stan1005

Exceeded Checkpoint tolerable failure

2022-12-07 Thread Madan D via user
Hello All, I am seeing below issue after I upgraded from 1.9.0 to 1.14.2 while publishing messages to pub sub which is causing frequent job restart and slow processing. Can you please help me. `Caused by: org.apache.flink.util.FlinkRuntimeException: Exceeded checkpoint tolerable failure

Re: 提交任务不能指定第三方jar

2022-12-07 Thread melin li
如果是作业依赖的jar,是可以打一个flat jar。有两种场景: 1、sql作业中,用户依赖某个connector jar,但平台不提供这个connector,需要用户上传, 2、自定义udf 管理,依赖的jar 需要和任务一起提交。 yuxia 于2022年12月8日周四 10:06写道: > 为啥说 不能提交依赖的第三方jar?用户的 job 把这些包打进去不就好了吗? 还是说你指的是 sql 作业? > > Best regards, > Yuxia > > > 发件人: "melin li" > 收件人: "user-zh" > 发送时间: 星期四, 2022年 12

Re: How to set disableChaining like streaming multiple INSERT statements in a StatementSet ?

2022-12-07 Thread yuxia
Could you please post the image of the running job graph in Flink UI? Best regards, Yuxia 发件人: "hjw" 收件人: "User" 发送时间: 星期四, 2022年 12 月 08日 上午 12:05:00 主题: How to set disableChaining like streaming multiple INSERT statements in a StatementSet ? Hi, I create a StatementSet that

Re: 提交任务不能指定第三方jar

2022-12-07 Thread yuxia
为啥说 不能提交依赖的第三方jar?用户的 job 把这些包打进去不就好了吗? 还是说你指的是 sql 作业? Best regards, Yuxia 发件人: "melin li" 收件人: "user-zh" 发送时间: 星期四, 2022年 12 月 08日 上午 9:46:45 主题: 提交任务不能指定第三方jar 客户端提交flink job 不能提交依赖的第三方jar,例如自定的函数jar,sql 里面的依赖connector jar,需要提前放置好。如果基于flink 平台化,需要动态的添加jar。目前可能的做法,就是把依赖的jar,

提交任务不能指定第三方jar

2022-12-07 Thread melin li
客户端提交flink job 不能提交依赖的第三方jar,例如自定的函数jar,sql 里面的依赖connector jar,需要提前放置好。如果基于flink 平台化,需要动态的添加jar。目前可能的做法,就是把依赖的jar, 动态的添加作业jar 的lib目录下。getJobJarAndDependencies 就是从jar 中获取依赖的jar。不是很方便。 是可以添加一个参数,指定依赖的jar,flink 设计各种诡异。 [image: image.png]

Deterministic co-results

2022-12-07 Thread Salva Alcántara
It's well-known that Flink does not provide any guarantees on the order in which a CoProcessFunction (or one of its multiple variations) processes its two inputs [1]. I wonder then what is the current best practice/recommended approach for cases where one needs deterministic results in presence

Re: Accessing kafka message key from a KafkaSource

2022-12-07 Thread Yaroslav Tkachenko
Hi Noel, It's definitely possible. You need to implement a custom KafkaRecordDeserializationSchema: its "deserialize" method gives you a ConsumerRecord as an argument so that you can extract Kafka message key, headers, timestamp, etc. Then pass that when you create a KafkaSource via

SNI issue

2022-12-07 Thread Jean-Damien Hatzenbuhler via user
Hello, When using the job manager API with an https proxy that uses SNI in front to route the traffic, I get an issue because the flink cli doesn't use the SNI when calling in https the API. Did other user face this issue ? Regards

How to set disableChaining like streaming multiple INSERT statements in a StatementSet ?

2022-12-07 Thread hjw
Hi, I create a StatementSet that contains multiple INSERT statements. I found that multiple INSERT tasks will be organized in a operator chain when StatementSet.execute() is invoked. How to set disableChaining like streaming multiple INSERT statements in a StatementSet api ? env:

Re: ZLIB Vulnerability Exposure in Flink statebackend RocksDB

2022-12-07 Thread Yanfei Lei
Hi Vidya Sagar, Thanks for bringing this up. The RocksDB state backend defaults to Snappy[1]. If the compression option is not specifically configured, this vulnerability of ZLIB has no effect on the Flink application for the time being. *> is there any plan in the coming days to address this?

Detecting when a job is "caught up"

2022-12-07 Thread Kurtis Walker via user
I’ve got a Flink job that uses a HybridSource. It reads a bunch of S3 files and then transitions to a Kafka Topic where it stays processing live data. Everything gets written to a warehouse where users build and run reports. In takes about 36 hours to read data from the beginning before it’s

Re: flink 1.16 lookup join重试策略问题

2022-12-07 Thread Lincoln Lee
如果结束时还未关联上,就视为当前记录不存在,按 inner join 过滤或 left join 补 null 值 https://nightlies.apache.org/flink/flink-docs-release-1.16/zh/docs/dev/table/sql/queries/hints/#%e5%bc%80%e5%90%af%e7%bc%93%e5%ad%98%e5%af%b9%e9%87%8d%e8%af%95%e7%9a%84%e5%bd%b1%e5%93%8d Best, Lincoln Lee casel.chen 于2022年12月7日周三

flink 1.16 lookup join重试策略问题

2022-12-07 Thread casel.chen
我们有场景会发生维表数据后于事实流表数据到达,使用flink 1.16 lookup join重试策略时,如果超过重试次数还没关联上会发生什么?待关联字段值都为null么?

Re:Re: 如何扩展flink sql以实现延迟调用?

2022-12-07 Thread casel.chen
interval join的缺点是只能输出关联上的结果,却无法输出未能关联上的结果(后续我需要对未关联上的结果进行特殊处理) 在 2022-12-07 13:33:50,"Lincoln Lee" 写道: >双流 join 的场景下可以考虑关联条件引入时间属性,变成 interval join 可以实现一定的时间对齐( >https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/dev/table/sql/queries/joins/#interval-joins >)

Re:Re: 如何扩展flink sql以实现延迟调用?

2022-12-07 Thread casel.chen
谢谢你给的建议,不过我们还没有升级到flink 1.16,目前在使用的是flink 1.15。 如果要使用flink sql来实现的话,是不是可以利用窗口去重来达到数据延迟关联的效果? 在每条数据到达后开一个10分钟累加窗口(step和size均为10分钟)根据key去重,在等待窗口结束之时输出的去重结果再跟维表进行lookup join 在 2022-12-07 13:33:50,"Lincoln Lee" 写道: >双流 join 的场景下可以考虑关联条件引入时间属性,变成 interval join 可以实现一定的时间对齐(

Re: Cleanup for high-availability.storageDir

2022-12-07 Thread Alexis Sarda-Espinosa
I see, thanks for the details. I do mean replacing the job without stopping it terminally. Specifically, I mean updating the container image with one that contains an updated job jar. Naturally, the new version must not break state compatibility, but as long as that is fulfilled, the job should

Re: Cleanup for high-availability.storageDir

2022-12-07 Thread Alexis Sarda-Espinosa
Hi Matthias, Then the explanation is likely that the job has not reached a terminal state. I was testing updates *without* savepoints (but with HA), so I guess that never triggers automatic cleanup. Since, in my case, the job will theoretically never reach a terminal state with this

Accessing kafka message key from a KafkaSource

2022-12-07 Thread Noel OConnor
Hi, I'm using a kafka source to read in messages from kafka into a datastream. However I can't seem to access the key of the kafka message in the datastream. Is this even possible ? cheers Noel

(无主题)

2022-12-07 Thread zhufangliang
退订

Re: 1.15.2作业频繁(每 几十分钟 ~ 1小时)报 LocalTransportException: readAddress(..) failed: Connection timed out .

2022-12-07 Thread yidan zhao
谢谢,不过这几个参数和netty关系不大吧。 heartbeat和akka的可能会和rpc超时有关,不过我这个是netty的报错,不是rpc部分。 web和rest应该是和client提交任务有关。 Stan1005 <532338...@qq.com.invalid> 于2022年12月7日周三 15:51写道: > > 我也遇到过,tm的slot数一直是2,并行度高了就很容易出这个报错。tm内存保持为20480mb,相同的job讲并行度降低到256就没有报过这个。 > 另外可以考虑适当增加这几个参数(具体需要改动哪些建议先搜下这些参数的作用) > set