目前感觉和 https://issues.apache.org/jira/browse/FLINK-19249 和
https://issues.apache.org/jira/browse/FLINK-16030
有点类似。网络环境不稳定。相同配置在物理机没问题。
yidan zhao 于2022年12月7日周三 16:11写道:
>
> 谢谢,不过这几个参数和netty关系不大吧。
> heartbeat和akka的可能会和rpc超时有关,不过我这个是netty的报错,不是rpc部分。
> web和rest应该是和client提交任务有关。
>
> Stan1005
Hello All,
I am seeing below issue after I upgraded from 1.9.0 to 1.14.2 while publishing
messages to pub sub which is causing frequent job restart and slow processing.
Can you please help me.
`Caused by: org.apache.flink.util.FlinkRuntimeException: Exceeded checkpoint
tolerable failure
如果是作业依赖的jar,是可以打一个flat jar。有两种场景:
1、sql作业中,用户依赖某个connector jar,但平台不提供这个connector,需要用户上传,
2、自定义udf 管理,依赖的jar 需要和任务一起提交。
yuxia 于2022年12月8日周四 10:06写道:
> 为啥说 不能提交依赖的第三方jar?用户的 job 把这些包打进去不就好了吗? 还是说你指的是 sql 作业?
>
> Best regards,
> Yuxia
>
>
> 发件人: "melin li"
> 收件人: "user-zh"
> 发送时间: 星期四, 2022年 12
Could you please post the image of the running job graph in Flink UI?
Best regards,
Yuxia
发件人: "hjw"
收件人: "User"
发送时间: 星期四, 2022年 12 月 08日 上午 12:05:00
主题: How to set disableChaining like streaming multiple INSERT statements in a
StatementSet ?
Hi,
I create a StatementSet that
为啥说 不能提交依赖的第三方jar?用户的 job 把这些包打进去不就好了吗? 还是说你指的是 sql 作业?
Best regards,
Yuxia
发件人: "melin li"
收件人: "user-zh"
发送时间: 星期四, 2022年 12 月 08日 上午 9:46:45
主题: 提交任务不能指定第三方jar
客户端提交flink job 不能提交依赖的第三方jar,例如自定的函数jar,sql 里面的依赖connector
jar,需要提前放置好。如果基于flink 平台化,需要动态的添加jar。目前可能的做法,就是把依赖的jar,
客户端提交flink job 不能提交依赖的第三方jar,例如自定的函数jar,sql 里面的依赖connector
jar,需要提前放置好。如果基于flink 平台化,需要动态的添加jar。目前可能的做法,就是把依赖的jar, 动态的添加作业jar
的lib目录下。getJobJarAndDependencies
就是从jar 中获取依赖的jar。不是很方便。 是可以添加一个参数,指定依赖的jar,flink 设计各种诡异。
[image: image.png]
It's well-known that Flink does not provide any guarantees on the order in
which a CoProcessFunction (or one of its multiple variations) processes its
two inputs [1]. I wonder then what is the current best practice/recommended
approach for cases where one needs deterministic results in presence
Hi Noel,
It's definitely possible. You need to implement a
custom KafkaRecordDeserializationSchema: its "deserialize" method gives you
a ConsumerRecord as an argument so that you can extract Kafka message key,
headers, timestamp, etc.
Then pass that when you create a KafkaSource via
Hello,
When using the job manager API with an https proxy that uses SNI in front
to route the traffic, I get an issue because the flink cli doesn't use the
SNI when calling in https the API.
Did other user face this issue ?
Regards
Hi,
I create a StatementSet that contains multiple INSERT statements.
I found that multiple INSERT tasks will be organized in a operator chain
when StatementSet.execute() is invoked.
How to set disableChaining like streaming multiple INSERT statements in a
StatementSet api ?
env:
Hi Vidya Sagar,
Thanks for bringing this up.
The RocksDB state backend defaults to Snappy[1]. If the compression option
is not specifically configured, this vulnerability of ZLIB has no effect on
the Flink application for the time being.
*> is there any plan in the coming days to address this?
I’ve got a Flink job that uses a HybridSource. It reads a bunch of S3 files
and then transitions to a Kafka Topic where it stays processing live data.
Everything gets written to a warehouse where users build and run reports. In
takes about 36 hours to read data from the beginning before it’s
如果结束时还未关联上,就视为当前记录不存在,按 inner join 过滤或 left join 补 null 值
https://nightlies.apache.org/flink/flink-docs-release-1.16/zh/docs/dev/table/sql/queries/hints/#%e5%bc%80%e5%90%af%e7%bc%93%e5%ad%98%e5%af%b9%e9%87%8d%e8%af%95%e7%9a%84%e5%bd%b1%e5%93%8d
Best,
Lincoln Lee
casel.chen 于2022年12月7日周三
我们有场景会发生维表数据后于事实流表数据到达,使用flink 1.16 lookup
join重试策略时,如果超过重试次数还没关联上会发生什么?待关联字段值都为null么?
interval join的缺点是只能输出关联上的结果,却无法输出未能关联上的结果(后续我需要对未关联上的结果进行特殊处理)
在 2022-12-07 13:33:50,"Lincoln Lee" 写道:
>双流 join 的场景下可以考虑关联条件引入时间属性,变成 interval join 可以实现一定的时间对齐(
>https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/dev/table/sql/queries/joins/#interval-joins
>)
谢谢你给的建议,不过我们还没有升级到flink 1.16,目前在使用的是flink 1.15。
如果要使用flink sql来实现的话,是不是可以利用窗口去重来达到数据延迟关联的效果?
在每条数据到达后开一个10分钟累加窗口(step和size均为10分钟)根据key去重,在等待窗口结束之时输出的去重结果再跟维表进行lookup join
在 2022-12-07 13:33:50,"Lincoln Lee" 写道:
>双流 join 的场景下可以考虑关联条件引入时间属性,变成 interval join 可以实现一定的时间对齐(
I see, thanks for the details.
I do mean replacing the job without stopping it terminally. Specifically, I
mean updating the container image with one that contains an updated job
jar. Naturally, the new version must not break state compatibility, but as
long as that is fulfilled, the job should
Hi Matthias,
Then the explanation is likely that the job has not reached a terminal
state. I was testing updates *without* savepoints (but with HA), so I guess
that never triggers automatic cleanup.
Since, in my case, the job will theoretically never reach a terminal state
with this
Hi,
I'm using a kafka source to read in messages from kafka into a datastream.
However I can't seem to access the key of the kafka message in the datastream.
Is this even possible ?
cheers
Noel
退订
谢谢,不过这几个参数和netty关系不大吧。
heartbeat和akka的可能会和rpc超时有关,不过我这个是netty的报错,不是rpc部分。
web和rest应该是和client提交任务有关。
Stan1005 <532338...@qq.com.invalid> 于2022年12月7日周三 15:51写道:
>
> 我也遇到过,tm的slot数一直是2,并行度高了就很容易出这个报错。tm内存保持为20480mb,相同的job讲并行度降低到256就没有报过这个。
> 另外可以考虑适当增加这几个参数(具体需要改动哪些建议先搜下这些参数的作用)
> set
21 matches
Mail list logo