Hello, I have been using avro with hadoop and oozie for months now and I am very happy with the results.
The only point I see as a limitation now is that we specify avro schemes in workflow.xml (job.xml): - avro.input.schema - avro.output.schema Since this info is already provided in Mapper/Reducer signatures, I see this as redundant. The schema is also present in all my serialized files, which means that the schema is specified in 3 different places. >From a run point of view, this is a pain, since any schema modification (let's say a simple optional field added) forces me to update many job files. This task is very error prone and since we have a large amount of jobs, it generates a lot of work. The only solution I see now would be to find/replace in the build script, but I hope I could find a better solution by providing some generic schemes to the job file, or find a way to deactivate schema validation in the job. Any help will be appreciated! -- Julien Muller