on 2019/9/3 15:38, 守护 wrote:
org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Received late 
message for now expired checkpoint attempt 3 from 
24674987621178ed1a363901acc5b128 of job fd5010cbf20501339f1136600f0709c3.
请问这个是什么问题呢?

可以根据这些失败的task的id去查询这些任务落在哪一个taskmanager上,经过排查发现,是同一台机器,通过ui看到该机器流入的数据明显比别的流入量大 因此是因为数据倾斜导致了这个问题,追根溯源还是下游消费能力不足的问题

also reference:
https://juejin.im/post/5c374fe3e51d451bd1663756

Reply via email to