I forgot to mention that the job was recently moved from the cluster with
SSD disk to SATA and SSD disk. On the old hardware, everything worked fine.
Flink version is 1.6.2. There were FLASH optimized setting for RocksDB.
I've changed to SPINNING_DISK_OPTIMIZED and it didn't have any effect.

Old servers:
https://www.hetzner.de/dedicated-rootserver/px91-ssd

New Server:
https://www.hetzner.de/dedicated-rootserver/ax60-ssd

On Mon, Dec 3, 2018 at 8:07 PM Sayat Satybaldiyev <stb.satyb...@gmail.com>
wrote:

> Dear Flink community,
>
> Would anyone give a clue how to debug a job that has a high backpressure
> in the kafka source? We have a flink job that joins two stream via Process
> Function and Rocksdb state backend from two kafka topics. The job is
> significantly lagging behind ~8 hours and produces an incorrect result.
>
> Flink UI gives a hint that Source Functions(recommendation stream and
> custom source) are backpressure while recommendation-click join is fine.
>
> I've looked into JM and TM logs and there's nothing stage to me. Except
> "Kafka error sending fetch request" which happens during a checkpoint.
> Checkpoints happen once in 20min and utilize almost all network interface.
>
> Please find UI screenshots and flink logs attached to this email.
>
>
> https://drive.google.com/file/d/14h8zwC_49wxt5uNPYtM3LN6WhJ7lyeVS/view?usp=sharing
>
> https://drive.google.com/file/d/1s6I___S7u0pBJyWdnmYaH0e_MwGr3CgY/view?usp=sharing
>
>

Reply via email to