I had the same problem a while ago and for the same reasons as you mention
we decided to use fingerprints (MD5 hash of the schema), however there are
some catches here.
First I believe that the normalisation of the schema is incomplete so you
might end up with different hashes of the same schema.
Thanks for the reply, Svante!
What causes the schema normalization to be incomplete? And is that a
problem? As long as the reader can get the schema, it shouldn't matter that
there are duplicates – as long as the differences between the duplicates do
not affect decoding.
Would it make sense to
What causes the schema normalization to be incomplete?
Bad implementation, I use C++ avro and it's not complete and not very
active.
And is that a problem? As long as the reader can get the schema, it
shouldn't matter that there are duplicates – as long as the differences
between the duplicates
The Confluent tools seem to be very oriented towards a Java-heavy
infrastructure, and I'd rather not have to re-implement all their somewhat
complex tooling in Ruby and Go. I'd much prefer a simplified model that can
more easily be implemented.
As an aside, Confluent *could* support such a