I want my consumers to process large batches, so I aim to have the consumer
listener "awake", say, on 1800mb of data or every 5min, whichever comes
first.
Mine is a kafka-springboot application, the topic has 28 partitions, and
this is the configuration I explicitly change:
| Parameter | Value I set | Default Value | Why I set it
this way |
| ------------------------- | ----------- | ------------- |
----------------------- |
| fetch.max.bytes | 1801mb | 50mb |
fetch.min.bytes+1mb |
| fetch.min.bytes | 1800mb | 1b | desired batch
size |
| fetch.max.wait.ms | 5min | 500ms | desired cadence
|
| max.partition.fetch.bytes | 1801mb | 1mb | unbalanced
partitions |
| request.timeout.ms | 5min+1sec | 30sec |
fetch.max.wait.ms + 1sec|
| max.poll.records | 10000 | 500 | 1500 found too
low |
| max.poll.interval.ms | 5min+1sec | 5min |
fetch.max.wait.ms + 1sec|
Nevertheless, I produce ~2gb of data to the topic, and I see the
consumer-listener (a Batch Listener) is called many times per second -- way
more than desired rate.
I logged the serialized-size of the `ConsumerRecords<?,?>` argument, and
found that it is never more than 55mb.
This hints that I was not able to set fetch.max.bytes above the default
50mb.
Any idea how I can troubleshoot this?
-----
Edit:
I found this question:
https://stackoverflow.com/questions/72812954/kafka-msk-a-configuration-of-high-fetch-max-wait-ms-and-fetch-min-bytes-is-beh?rq=1
Is it really impossible as stated?