We are currently using the Kafka S3 connector to ship Avro data to S3. We made a change to one of our Avro schemas and have noticed consumer throughput on the Kafka connector drop considerably. I am wondering if there is anything we can do to avoid such issues when we update schemas in the future?
This is what I believe is happening: · The avro producer application is running on 12 instances. They are restarted in a rolling fashion, switching from producing schema version 1 before the restart to schema version 2 afterward. · While the rolling restart is occurring, data on schema version 1 and schema version 2 is simultaneously being written to the topic. · The Kafka connector has to close the current avro file for a partition and ship it whenever it detects a schema change, which is happening several times due to the rolling nature of the schema update deployment and the mixture of message versions being written during this time. This process causes the overall consumer throughput to plummet. Am I reasoning correctly about what we’re observing here? Is there any way to avoid this when we change schemas (short of stopping all instances of the service and bringing them up together on the new schema version)? Thanks, Dave