Hi I am puzzled on this consumer group behavior. I have 40 consumers within a single consumer group with large lags. We are looking to increase partitions and number of consumers as lags indicate our records are produced faster than this consumer group can consume.
However I observed this interesting behavior. Hour 10, this consumer group consumed/produced 380 million records. Using the same binary at hour 22, it produced 250 million records to its output topics. Both hours contain considerable amount of lags. Stream engine seems to be fetching records based on the rate that producers produce records. If it’s capable of processing 380 million records, why doesn’t it fetch more records at hour 22 to get rid of lags? I tried a few tweaks on consumer configuration to no avail. Based on the metric we have on “received counts” we are definite not receiving enough records to address the lag. Could anyone help/comment on this? Thank you Sent from my mobile device.