On 10/10/11 11:41 AM, "Julien Muller" <julien.mul...@ezako.com> wrote:
> Hello, > > Thanks for your answer, let me try to clarify my context a bit: > >> I'm not all that familiar with how Oozie interacts with Avro. > Let's get oozie out of the picture. I use job.xml files to configure Jobs. > This means I do not have any JobConf object and I cannot use AvroJob. > Therefore I directly write the job properties (as what AvroJob outputs). > >> The Job must set its avro.input.schema and avro.output.schema properties >> this can be done in code (see the unit tests in the Avro mapred project for >> examples), > The solution I have now is basically based on the Avro mapred unit tests. But > in my context, it is not an option to code (using the $SCHEMA property) at the > job configuration level. > where you code: > AvroJob.setInputSchema(job, Schema.create(Schema.Type.STRING)); > I have to copy the entire schema in job.xml file. And I have to update it > every time my schema get updated. > I hope I can find a better solution. I suppose that in AvroJob we could transmit only the class name in a property, and use that to look up the schema for generated classes using reflection. Could you do something similar? I don't think it is possible to avoid configuring at least some sort of pointer to where the schema is. This could be via a property, or if you already have the job class, an annotation on that class. > >> and if you are using SpecificRecords and DataFiles the schema is available to >> the code where necessary. > I am not sure what you mean here. I am using SpecificRecords and would like to > avoid specifying avro.input.schema, since this info is already here in the > specific record. Potentially the AvroMapper / AvroReducer could have a fall-back for obtaining the schema if the property is not set reflection on a class name or an annotation . If this looks like it is an enhancement request for Avro (or a bug) please file a JIRA ticket. Thanks! > > Thanks, > > Julien Muller > > 2011/10/10 Scott Carey <scottca...@apache.org> >> I'm not all that familiar with how Oozie interacts with Avro. >> >> The Job must set its avro.input.schema and avro.output.schema properties >> this can be done in code (see the unit tests in the Avro mapred project for >> examples), and if you are using SpecificRecords and DataFiles the schema is >> available to the code where necessary. >> >> >> >> On 10/10/11 5:41 AM, "Julien Muller" <julien.mul...@ezako.com> wrote: >> >>> Hello, >>> >>> I have been using avro with hadoop and oozie for months now and I am very >>> happy with the results. >>> >>> The only point I see as a limitation now is that we specify avro schemes in >>> workflow.xml (job.xml): >>> - avro.input.schema >>> - avro.output.schema >>> Since this info is already provided in Mapper/Reducer signatures, I see this >>> as redundant. The schema is also present in all my serialized files, which >>> means that the schema is specified in 3 different places. >>> >>> From a run point of view, this is a pain, since any schema modification >>> (let's say a simple optional field added) forces me to update many job >>> files. This task is very error prone and since we have a large amount of >>> jobs, it generates a lot of work. >>> >>> The only solution I see now would be to find/replace in the build script, >>> but I hope I could find a better solution by providing some generic schemes >>> to the job file, or find a way to deactivate schema validation in the job. >>> Any help will be appreciated! >>> >>> -- >>> Julien Muller >