Thank you, Scott. That has cleared up some misunderstanding on my part. I want to emit both records as a Pair, and have now implemented that by using a Record schema holding two sub-records, one for type A and one for type B, so I can just write the relevant datum to the correct sub-record, which gives me exactly what I need.
Andrew >________________________________ > From: Scott Carey <scottca...@apache.org> >To: "user@avro.apache.org" <user@avro.apache.org>; Andrew Kenworthy ><adwkenwor...@yahoo.com> >Sent: Thursday, December 8, 2011 6:45 PM >Subject: Re: Collecting union-ed Records in AvroReducer > > > >On 12/8/11 4:10 AM, "Andrew Kenworthy" <adwkenwor...@yahoo.com> wrote: > > >>Hallo, >> >>is it possible to write/collect a union-ed record from an avro reducer? >> >>I have a reduce class (extending AvroReducer), and the output schema is a >>union schema of record type A and record type B. In the reduce logic I >>want to combine instances of A and B in the same datum, passing it to my >>Avrocollector. My code looks a bit like this: >> >> >> > >If both records were created in the reducer, you can call collect twice, >once with each record. Collect in general can be called as many times as >you wish. > >If you want to combine two records into a single datum rather than emit >multiple datums, you do not want a union, you need a Record. A union is a >single datum that may be only one of its branches in a single datum. > >In short, do you want to emit both records individually or as a pair? If >it is a pair, you need a Record, if it is multiple outputs or either/or, >it is a Union. > > > >> >>Record unionRecord = new GenericData.Record(myUnionSchema); // not legal! >>unionRecord.put("type A", recordA); >>unionRecord.put("type B", recordB); >> >>collector.collect(unionRecord); >> >>but GenericData.Record constructor expects a Record Schema. How can I >>write both records such that they appear in the same output >> datum? > >If your output is either one type or another, see Doug's answer. > >for multiple datums, it is > >output schema is a union of two records (a datum is either one or the >other): >["RecordA", "RecordB"] >then the code is: > >collector.collect(recordA); >collector.collect(recordB); > > >If you want a single datum that contains both a RecordA and a RecordB you >need to have your output schema be a Record with two fields: > >{"type":"record", "fields":[ > {"name":"recordA", "type":"RecordA"}, > {"name":"recordB", "type":"RecordB"} >]} > >And you would use this record schema to create the GenericRecord, and then >populate the fields with the inner records, then call collect once with >the outer record. > >Another choice is to output the output be an avro array of the union type >that may have any number of RecordA and RecordB's in a single datum. > >> >>Andrew > > > > >