Hi Ryan, Thanks for your explanation. I am thinking now that the design of AVRO suggests that data and schemas are very planned things. That changes are planned through versioning and we don't like duplicated schemas (when the positioning makes sense).
I have a round about way of learning. Sometimes I am working with data and I think it's convenient to transform my data programmatically and try to obtain a schema from that. Also I think that schemas can become cumbersome when many fields are involved in intricate patterns. I think maybe there are other forms maybe more well suited for that. Regarding your proposals 1,2 seem reasonable to me. But someone like myself might also not fully understand the design of AVRO. A better exception or some kind of lead for armchair programmers to better understand the exception. Thanks for mentioning the copy operation. Finally I do see something about aliases. Thanks, Colin On Fri, Sep 18, 2020 at 5:32 AM Ryan Skraba <[email protected]> wrote: > > Hello Colin, you've hit one bit of fussiness with the Java SDK... you > can't reuse a Schema.Field object in two Records, because a field > knows its own position in the record[1]. If a field were to belong to > two records at different positions, this method would have an > ambiguous response. > > As a workaround, since Avro 1.9, there's a copy constructor that you > can use to clone the field: > > List<Schema.Field> clonedFields = existingFields.stream() > .map(f -> new Schema.Field(f, f.schema())) > .collect(Collectors.toList()); > > That being said, I don't see any reason we MUST throw an exception. > There's a couple of alternative strategies we could use in the Java > SDK: > > 1. If the position is the same in both records, allow the field to be > reused (which enables cloning use cases). > > 2. Make a copy of the field to reuse internally if the position is > already set (probably OK, since it's supposed to be immutable). > > 3. Allow the field to be reused, only throw the exception only if > someone calls the position() method later. > > Any of those sound like a useful change for your use case? Don't > hesitate to create a JIRA or contribution if you like! > > All my best, Ryan > > On Fri, Sep 18, 2020 at 8:27 AM Colin Williams > <[email protected]> wrote: > > > > Hello, > > > > I'm trying to understand working with Avro records and schemas, > > programmatically. Then I was first trying to create a new schema and > > records based on existing records, but with a different name / > > namespace. It seems then I don't understand getFields() or > > createRecord(...). Why can't I use the fields obtained from > > getFields() in createRecord()? How would I go about this properly? > > > > // for an existing record already present > > GenericRecord someRecord > > > > // get a list of existing fields > > List<Schema.Field> existingFields = someRecord.getSchema().getFields(); > > > > // schema for new record with existing fields > > Schema updatedSchema = createRecord("UpdatedName", > > "","avro.com.example.namespace" , false, existingFields); > > > > ^^ throws an exception ^^ > > > > /* Caused by: org.apache.avro.AvroRuntimeException: Field already > > used: eventMetadata type:UNION pos:0 > > at org.apache.avro.Schema$RecordSchema.setFields(Schema.java:888) > > at org.apache.avro.Schema$RecordSchema.<init>(Schema.java:856) > > at org.apache.avro.Schema.createRecord(Schema.java:217) > > */ > > > > final int length = fields.size(); > > > > GenericRecord clonedRecord = new GenericData.Record(updatedSchema); > > for (int i = 0; i < length; i++) { > > final Schema.Field field = existingFields.get(i); > > clonedRecord.put(i, someRecord.get(i)); > > } > > > > > > Best Regards, > > > > Colin Williams
