Hello,

I have been using avro with hadoop and oozie for months now and I am very
happy with the results.

The only point I see as a limitation now is that we specify avro schemes in
workflow.xml (job.xml):
- avro.input.schema
- avro.output.schema
Since this info is already provided in Mapper/Reducer signatures, I see this
as redundant. The schema is also present in all my serialized files, which
means that the schema is specified in 3 different places.

>From a run point of view, this is a pain, since any schema modification
(let's say a simple optional field added) forces me to update many job
files. This task is very error prone and since we have a large amount of
jobs, it generates a lot of work.

The only solution I see now would be to find/replace in the build script,
but I hope I could find a better solution by providing some generic schemes
to the job file, or find a way to deactivate schema validation in the job.
Any help will be appreciated!

-- 
Julien Muller

Reply via email to