Thanks Jun & Phil! Shone
On Thu, Jun 13, 2013 at 12:00 AM, Jun Rao <jun...@gmail.com> wrote: > Yes, we just have customized encoder that encodes the first 4 bytes of md5 > of the schema, followed by Avro bytes. > > Thanks, > > Jun > > > On Wed, Jun 12, 2013 at 9:50 AM, Shone Sadler <shone.sad...@gmail.com > >wrote: > > > Jun, > > I like the idea of an explicit version field, if the schema can be > derived > > from the topic name itself. The storage (say 1-4 bytes) would require > less > > overhead than a 128 bit md5 at the added cost of managing the version#. > > > > Is it correct to assume that your applications are using two schemas > then, > > one system level schema to deserialize the schema id and bytes for the > > application message and a second schema to deserialize those bytes with > the > > application schema? > > > > Thanks again! > > Shone > > > > > > On Wed, Jun 12, 2013 at 11:31 AM, Jun Rao <jun...@gmail.com> wrote: > > > > > Actually, currently our schema id is the md5 of the schema itself. Not > > > fully sure how this compares with an explicit version field in the > > schema. > > > > > > Thanks, > > > > > > Jun > > > > > > > > > On Wed, Jun 12, 2013 at 8:29 AM, Jun Rao <jun...@gmail.com> wrote: > > > > > > > At LinkedIn, we are using option 2. > > > > > > > > Thanks, > > > > > > > > Jun > > > > > > > > > > > > On Wed, Jun 12, 2013 at 7:14 AM, Shone Sadler < > shone.sad...@gmail.com > > > >wrote: > > > > > > > >> Hello everyone, > > > >> > > > >> After doing some searching on the mailing list for best practices on > > > >> integrating Avro with Kafka there appears to be at least 3 options > for > > > >> integrating the Avro Schema; 1) embedding the entire schema within > the > > > >> message 2) embedding a unique identifier for the schema in the > message > > > and > > > >> 3) deriving the schema from the topic/resource name. > > > >> > > > >> Option 2, appears to be the best option in terms of both efficiency > > and > > > >> flexibility. However, from a programming perspective it complicates > > the > > > >> solution with the need for both an envelope schema (containing a > > "schema > > > >> id" and "bytes" field for record data) and message schema > (containing > > > the > > > >> application specific message fields). This requires two levels of > > > >> serialization/deserialization. > > > >> Questions: > > > >> 1) How are others dealing with versioning of schemas? > > > >> 2) Is there a more elegant means of embedding a schema ids in a Avro > > > >> message (I am new to both currently ;-)? > > > >> > > > >> Thanks in advance! > > > >> > > > >> Shone > > > >> > > > > > > > > > > > > > >