The Confluent tools seem to be very oriented towards a Java-heavy infrastructure, and I'd rather not have to re-implement all their somewhat complex tooling in Ruby and Go. I'd much prefer a simplified model that can more easily be implemented. As an aside, Confluent *could* support such a standard by using a custom "fingerprint type" that's just their id number.
On Thu, Jul 9, 2015 at 2:21 PM Svante Karlsson <svante.karls...@csi.se> wrote: > >> What causes the schema normalization to be incomplete? > Bad implementation, I use C++ avro and it's not complete and not very > active. > > >And is that a problem? As long as the reader can get the schema, it > shouldn't matter that there are duplicates – as long as the >differences > between the duplicates do not affect decoding. > Not really a problem, we tend to use machine generated schemas and they > are always identical. > > I think there are holes in the simplification of types if I remember > correctly. > Namespaces should be collapsed, > {"type" : "string"} -> "string" etc > > Current implementation can't reliably decide if two types are identical. > If you correct the problem later then a registered schema would actually > change it's hash since it now can be simplified. If this is a problem > depends on your application. > > We currently encode this as you suggest <schema_type (byte)><schema_id > (32/128bits)><avro (binary)> > The binary fields should probably have a defined endianness also. > > I agree on that a defacto way of encoding this would be nice. Currently I > would say that the confluent / linkedin way is the normal.... > > >