Thank you Martin for your help and advice.
We use Confluent.io Schema Registry for avro schema versioning 
(http://docs.confluent.io/2.0.0/schema-registry/docs/intro.html).
Currently, our preference is to use POJO generated by avro compiler.
We will evaluate these different solutions.
A third option would be to expand the Schema Registry by the url of the 
generated POJOs (for each schema version). Then use the Java class loader 
mechanism to load (use) the right classes during deserialization. Thus, all 
consumers will use the correct version during deserialization.
It remains to check that data pipeline of old consumers is compliant at each 
schema evolution.
Best regards.
Youcef HILEM


De : Martin Kleppmann [mailto:mar...@kleppmann.com]
Envoyé : mardi 15 décembre 2015 22:11
À : user@avro.apache.org
Objet : Re: add a type to a union

One approach you could use: instead of a union, make a separate field for every 
possible type of message, and make every field a union with null (with default 
value null). Then only fill in the field for the corresponding message type. If 
you do this, a reader using an old version of the schema will simply see all 
fields as null (rather than an exception) if it encounters an unknown message 
type.

Another possibility: you can always use the writer schema to decode the data, 
and use the "generic" (dynamically typed) interface for accessing the data. In 
that case, schema evolution is handled by the application code.

Putting binary Avro blobs in the database is absolutely fine, as long as you 
attach a schema version to every blob (so that you know the writer schema with 
which it was encoded). You can keep the schemas in a separate database table.

Martin

On 15 Dec 2015, at 16:38, HILEM Youcef 
<youcef.hi...@laposte.fr<mailto:youcef.hi...@laposte.fr>> wrote:

Hi Martin,

Thank you for your clear answer.
I will test the example you provide.
In this case it is strongly not recommended to use binary avro as a blob in a 
database.
It is very difficult if not impossible to deserialize with a single reader all 
lines.
Best regards.
Youcef.

De : Martin Kleppmann [mailto:mar...@kleppmann.com]
Envoyé : lundi 14 décembre 2015 22:46
À : <user@avro.apache.org<mailto:user@avro.apache.org>>
Objet : Re: add a type to a union

Hi Youcef,

Glad you found my old blog post on Avro schema evolution :)

I encourage you to try a simple example, which will make it clearer: 
https://gist.github.com/ept/5fd7c625969248b31e73

In this example, the writer has a union of null, string and long, whereas the 
reader only has a union of null and string. A default value of null is set. If 
the record has a null or string value, it is correctly parsed by the reader. If 
the record has a long value, the reader throws an exception, because it is not 
one of the union datatypes it is expecting.

So the default value unfortunately doesn't help here. If you want to add a new 
branch to a union schema, you have to make sure that all the readers are 
updated with the new schema first, and only then should writers start 
generating data with the new schema.

Hope that helps.
Martin


On 7 Dec 2015, at 22:15, HILEM Youcef 
<youcef.hi...@laposte.fr<mailto:youcef.hi...@laposte.fr>> wrote:

Hi,

At La Poste Pôle Colis we use Avro in our new reactive architecture (kafka, 
spark streaming, Cassandra, elasticsearch, play framework).

In our modeling we used the type union to bring together in one schema all 
trace events of a package (arrival, departure, transportation, ...) at the body 
attribute.

Example :
{
"namespace" : "fr.laposte.colis.schema.pivot.message",
"name" : "Message",
"type" : "record",
"doc" : "Cette structure défini les caractéristiques de base d'un message. 
Peut(doit) être spécialisée pour un usage particulier",
                                "fields" : [
                                               {
                                                               "name" : 
"header",
                                                               "type" : 
"fr.laposte.colis.schema.pivot.common.message.MessageHeader",
                                                               "doc" :  "Entête 
du message"
                                               },{
                                                               "name" : "body",
                                                               "type" : 
["fr.laposte.colis.schema.pivot.announcement.AnnouncementEventMessageBody",
                                                                              
"fr.laposte.colis.schema.pivot.delivery.DeliveryEventMessageBody",
                                                                              
"fr.laposte.colis.schema.pivot.handling.HandlingEventMessageBody",
                                                                              
"fr.laposte.colis.schema.pivot.crm.CrmEventMessageBody",
                                                                              
"fr.laposte.colis.schema.pivot.customs.transport.CustomsTransportMessageBody",
                                                                              
"fr.laposte.colis.schema.pivot.customs.consignment.CustomsContainerEventMessageBody",
                                                                              
"fr.laposte.colis.schema.pivot.customs.consignment.CustomsParcelEventMessageBody",
                                                                              
"fr.laposte.colis.schema.pivot.rest.common.Rest",
                                                                              
"fr.laposte.colis.schema.pivot.reject.RejectMessageBody",
                                                                              
"fr.laposte.colis.schema.pivot.dpmo.defectrequest.DefectRequestEventMessageBody",
                                                                              
"fr.laposte.colis.schema.pivot.dpmo.defectresult.DefectResultEventMessageBody",
                                                                               
"fr.laposte.colis.schema.timeout.TimeoutMessageBody",
                                                                               
"fr.laposte.colis.schema.notification.Notification"
                                                                              ],
                                                               "doc" :  
"Abstraction du corps de message. Peut-être substitué par tout type dérivé du 
type MessageBody"
                                               }
                                ]
}

However, as well explained at 
(https://martin.kleppmann.com/2012/12/05/schema-evolution-in-avro-protocol-buffers-thrift.html)
 : “Union types are powerful, but you must take care when changing them. If you 
want to add a type to a union, you first need to update all readers with the 
new schema, so that they know what to expect. Only once all readers are 
updated, the writers may start putting this new type in the records they 
generate”

My question : is a default value for field “body” is sufficient so that if the 
reader encounters a union branch it does not know about, it can substitute the 
default value (see 
http://grokbase.com/t/avro/user/11b3bn6r6z/does-extending-union-break-compatibility)
 ?

Thank you in advance for your help.


Post-scriptum La Poste
Ce message est confidentiel. Sous reserve de tout accord conclu par
ecrit entre vous et La Poste, son contenu ne represente en aucun cas un
engagement de la part de La Poste. Toute publication, utilisation ou
diffusion, meme partielle, doit etre autorisee prealablement. Si vous
n'etes pas destinataire de ce message, merci d'en avertir immediatement
l'expediteur.


Post-scriptum La Poste

Ce message est confidentiel. Sous reserve de tout accord conclu par
ecrit entre vous et La Poste, son contenu ne represente en aucun cas un 
engagement de la part de La Poste. Toute publication, utilisation ou diffusion, 
meme partielle, doit etre autorisee prealablement. Si vous n'etes pas 
destinataire de ce message, merci d'en avertir immediatement
l'expediteur.

Reply via email to