Hello, I would like to ask you a question related to the size of Kafka stream. I want to put a large file data (e.g., file *.csv) to Kafka then use Spark streaming to get the output from Kafka. The file size is about 100MB with ~250K messages/rows (Each row has about 10 fields of integer). I see that Spark Streaming has received the first two partitions/batches with large number of messages, (the first is of 60K messages and the second is of 50K msgs). But from the third batch to the rest, Spark just received exactly 200 messages per batch (or partition). This is so few compared to the first batches. In addition, when I put other files to Kafka, all of the batches contain exactly 200 msg like the third batch of the first file.
I think that this problem is coming from Kafka or some configuration in Spark. I already tried to configure with the setting "auto.offset.reset=largest", but the problem is not resolved, I always get only 200msg/batch. I hope that you can understand my problem. Could anyone tell me how to fix this problem please? Thank you so much. Best regards, Alex