Hello, I'm building a streaming data platform using Beam on Dataflow, Pub/Sub, and Avro for message schemas and serialization.
My plan is to maintain full schema compatibility for messages on the same topic, but because Avro requires the writer schema in order to deserialize and convert between compatible schema versions, the encoded input/output messages need to use Avro's single object encoding (or an equivalent mechanism) which includes a schema fingerprint that can be dereferenced in a schema registry or cache at runtime to deserialize and convert the payload. Does Beam have any built-in support for this pattern? PubsubIO, for example, uses AvroCoder, which appears to have no support for this (though it may have in the past, based on some Git archaeology), using Avro binary encoding which does not include the header. If not, how do other users handle schema changes in their input data streams? Do you just avoid them altogether, migrating consumers on each schema change, or do you solve it manually with the above pattern? Tangentially: what about state schema evolution? I've had trouble finding any documentation about how to tackle this, whether for AvroCoder-coded state or when using a Beam row schema. Thanks, Patrick Lucas
