flink 输出异常数据

2023-05-28 Thread 小昌同学
各位老师,我有一个作业运行很久了,但是最近源系统有一些脏数据导致作业运行失败,看yarn的日志报错是空指针,但是我现在想把那一条脏数据捕获到,请问一下有啥办法吗?谢谢各位老师的指导 | | 小昌同学 | | ccc0606fight...@163.com |

Re: [ANNOUNCE] Apache Flink 1.16.2 released

2023-05-28 Thread weijie guo
Hi Jing, Thank you for caring about the releasing process. It has to be said that the entire process went smoothly. We have very comprehensive documentation[1] to guide my work, thanks to the contribution of previous release managers and the community. Regarding the obstacles, I actually only

Re: [ANNOUNCE] Apache Flink 1.16.2 released

2023-05-28 Thread weijie guo
Hi Jing, Thank you for caring about the releasing process. It has to be said that the entire process went smoothly. We have very comprehensive documentation[1] to guide my work, thanks to the contribution of previous release managers and the community. Regarding the obstacles, I actually only

回复: FlinkSQL大窗口小步长的滑动窗口解决方案

2023-05-28 Thread tanjialiang
Hi, Shammon FY. 理论上提高并行度是可以缓解,但是并行度调整太大对成本要求可能会比较高。因为写入其实不需要占用太多的资源,只是窗口触发后数据量过大(用户的基数),每条数据合并的操作成本过高(一条数据的窗口聚合成本需要合并24*60/5=288次)。 现在我只能想到以下几种解决办法 1. 将窗口步长往上调,这个问题可以从根本上解决(步长过长意味着窗口触发的时间会延后) 2. 步长可以往上调,使用early-fire机制,将未计算完成的窗口直接下发(触发的数据可能不符合近24小时的业务含义,下游系统需要支持upsert) 3.

Flink使用精准一次写入kafka报错

2023-05-28 Thread lxk
上封邮件发错了,重新发一下。项目中使用精准一次语义写入kafka,代码和配置如下: 写入代码如下: Properties producerProperties = MyKafkaUtil.getProducerProperties(); KafkaSink kafkaSink = KafkaSink.builder() .setBootstrapServers(Event2Kafka.parameterTool.get("bootstrap.server")) .setRecordSerializer(KafkaRecordSerializationSchema.builder()

Flink使用精准一次写入kafka报错

2023-05-28 Thread lxk
项目中使用精准一次语义写入kafka,代码和配置如下: Properties producerProperties = MyKafkaUtil.getProducerProperties(); KafkaSink kafkaSink = KafkaSink.builder() .setBootstrapServers(Event2Kafka.parameterTool.get("bootstrap.server")) .setRecordSerializer(KafkaRecordSerializationSchema.builder()

Re: Backpressure handling in FileSource APIs - Flink 1.16

2023-05-28 Thread Shammon FY
Hi Kamal, The network buffer will be full for specific `FileSource` when the job has back pressure which will block the source subtask. You can refer to network buffer [1] for more information. [1] https://flink.apache.org/2019/06/05/a-deep-dive-into-flinks-network-stack/ Best, Shammon FY On

Re: FlinkSQL大窗口小步长的滑动窗口解决方案

2023-05-28 Thread Shammon FY
Hi, 这是窗口触发后发送的数据量过大吗?调大资源,加大窗口计算的并发度是否可以缓解这个问题? Best, Shammon FY On Fri, May 26, 2023 at 2:03 PM tanjialiang wrote: > Hi, all. > 我在使用FlinkSQL的window tvf滑动窗口时遇到一些问题。 > 滑动步长为5分钟,窗口为24小时,group by > user_id的滑动窗口,当任务挂掉了或者从kafka的earliest-offset消费,checkpoint很难成功。 >