dear list,

I have what I imagine is a standard setup: a web application generates data in MySQL, which I want to analyze in Hadoop; I run a nightly process to extract tables of interest, Avroize, and dump into HDFS.

This has worked great so far because the tools I'm using make it easy to load a directory tree of Avros with the same schema.

The issue is what to do when schema changes occur in the SQL database. I believe column additions and deletions are handled automatically by the Avro loaders I'm using, but I need to deal with a column rename.

My thinking is: I could bake the table schemas at time of ETL into the Avros, for historical record, but then manually copy that schema out as a "master" schema and apply it to all Avros for which it's appropriate; then when a column rename occurs, go back and edit the master schema.

I've never used an external schema before, so please correct if I misunderstand how they work.

Anyone have wisdom to share on this topic? I'd love to hear from anyone who has done this, or has a better solution.

-Mason

Reply via email to