Hey guys,
I am struggling to improve the throughput of my simple flink application.
The target topology is this.

read_from_kafka(byte array deserializer) --rescale-->
processFunction(confluent avro deserialization) -> split -> 1.
data_sink,2.dlq_sink

Kafka traffic is pretty high
Partitions: 128
Traffic:  ~500k msg/s, 50Mbps.

Flink is running on k8s writing to hdfs3. I have ~200CPU and 400G memory at
hand. I have tried few configurations but I am not able to get the
throughput more than 1mil per second. (Which I need for recovering from
failures). I have tried increasing parallelism a lot (until 512), But it
has very little impact on the throughput. Primary metric I am considering
for throughput is kafka-source, numRecordsOut and message backlog. I have
already increased default kafka consumer defaults like max.poll.records
etc. Here are the few things I tried already.
Try0: Check raw kafka consumer throughput (kafka_source -> discarding_sink)
tm: 20, slots:4, parallelism 80
throughput: 10Mil/s

Try1: Disable chaining to introduce network related lag.
tm: 20, slots:4, parallelism 80
throughput: 1Mil/s
Also tried with increasing floating-buffers to 100, and buffers-per-channel
to 64. Increasing parallelism seems to have no effect.
Observation: out/in buffers are always at 100% utilization.

After this I have tried various different things with different network
configs, parallelism,jvm sizes etc. But throughput seems to be stuck at
1Mil. Can someone please help me to figure out what key metrics to look for
and how can I improve the situation. Happy to provide any details needed.

Flink version: 1.11.2

Reply via email to