ETL in face of column renames

Mason Wed, 22 May 2013 10:34:58 -0700

dear list,

I have what I imagine is a standard setup: a web application generatesdata in MySQL, which I want to analyze in Hadoop; I run a nightlyprocess to extract tables of interest, Avroize, and dump into HDFS.

This has worked great so far because the tools I'm using make it easy toload a directory tree of Avros with the same schema.

The issue is what to do when schema changes occur in the SQL database. Ibelieve column additions and deletions are handled automatically by theAvro loaders I'm using, but I need to deal with a column rename.

My thinking is: I could bake the table schemas at time of ETL into theAvros, for historical record, but then manually copy that schema out asa "master" schema and apply it to all Avros for which it's appropriate;then when a column rename occurs, go back and edit the master schema.

I've never used an external schema before, so please correct if Imisunderstand how they work.

Anyone have wisdom to share on this topic? I'd love to hear from anyonewho has done this, or has a better solution.


-Mason

ETL in face of column renames

Reply via email to