Re: Versioning Schema's

David Arthur Fri, 14 Jun 2013 07:05:31 -0700

I've done this in the past, and it worked out well. Stored Avro schemain ZooKeeper with an integer id and prefixed each message with the id.You have to make sure when you register a new schema that it resolveswith the current version (ResolvingDecoder helps with this).


-David


On 6/13/13 4:07 AM, Shone Sadler wrote:

Thanks Jun & Phil!

Shone


On Thu, Jun 13, 2013 at 12:00 AM, Jun Rao <jun...@gmail.com> wrote:

Yes, we just have customized encoder that encodes the first 4 bytes of md5
of the schema, followed by Avro bytes.

Thanks,

Jun


On Wed, Jun 12, 2013 at 9:50 AM, Shone Sadler <shone.sad...@gmail.com

wrote:
Jun,
I like the idea of an explicit version field, if the schema can be

derived

from the topic name itself. The storage (say 1-4 bytes) would require

less

overhead than a 128 bit md5 at the added cost of managing the version#.

Is it correct to assume that your applications are using two schemas

then,

one system level schema to deserialize the schema id and bytes for the
application message and a second schema to deserialize those bytes with

the

application schema?

Thanks again!
Shone


On Wed, Jun 12, 2013 at 11:31 AM, Jun Rao <jun...@gmail.com> wrote:

Actually, currently our schema id is the md5 of the schema itself. Not
fully sure how this compares with an explicit version field in the

schema.

Thanks,

Jun


On Wed, Jun 12, 2013 at 8:29 AM, Jun Rao <jun...@gmail.com> wrote:

At LinkedIn, we are using option 2.

Thanks,

Jun


On Wed, Jun 12, 2013 at 7:14 AM, Shone Sadler <

shone.sad...@gmail.com

wrote:

Hello everyone,

After doing some searching on the mailing list for best practices on
integrating Avro with Kafka there appears to be at least 3 options

for

integrating the Avro Schema; 1) embedding the entire schema within

the

message 2) embedding a unique identifier for the schema in the

message

and

3) deriving the schema from the topic/resource name.

Option 2, appears to be the best option in terms of both efficiency

and

flexibility.  However, from a programming perspective it complicates

the

solution with the need for both an envelope schema (containing a

"schema

id" and "bytes" field for record data) and message schema

(containing

the

application specific message fields).  This requires two levels of
serialization/deserialization.
Questions:
1) How are others dealing with versioning of schemas?
2) Is there a more elegant means of embedding a schema ids in a Avro
message (I am new to both currently ;-)?

Thanks in advance!

Shone

Re: Versioning Schema's

Reply via email to