I am trying to read a set of Parquet files from GCS and it is failing because the order of the Parquet columns does not match the order of fields defined by the SchemaBuilder.
Ex: I am defining my Schema as the following: public static Schema buildSchema() { SchemaBuilder.FieldAssembler<Schema> builder = SchemaBuilder.record("schema").fields(); builder.optionalString("A"); builder.optionalLong("B"); builder.optionalDouble("C"); builder.optionalDouble("D"); return builder.endRecord(); } And the parquet schema that I am attempting to read is returning an order of: D,A,B,C When I try to read the parquet files using: pipeline.apply("Read parquet", ParquetIO.read(schema).from(path)) ... It fails with: java.lang.IllegalArgumentException: Unable to encode element '{"D": 1.0, "A": "stringA", "B": 100, "C": 2.0}' with coder 'org.apache.beam.sdk.coders.AvroGenericCoder@f7996a3b'. Caused by: org.apache.avro.UnresolvedUnionException: Not in union ["null","string"]: 1.0 My question is if there is a way to have ParquetIO ignore the order of the columns or does it have to match exactly? For reference I am executing this pipeline on Dataflow v 2.23.0 Thanks for your help, Joe