> wrote a generic BigQuery reader or writer I think, I have seen an example here - https://github.com/the-dagger/dataflow-dynamic-schema/blob/28b7d075c18d6364a67129e56652f452da67a2f6/src/main/java/com/google/cloud/pso/bigquery/BigQuerySchemaMutator.java#L38
This is in Java, but you can try to adapt it for Python SDK. Don't know if it is possible, I use Java SDK myself for all stream processing apps, including Beam apps. Best Regards, Pavel Solomin Tel: +351 962 950 692 | Skype: pavel_solomin | Linkedin <https://www.linkedin.com/in/pavelsolomin> On Mon, 9 Aug 2021 at 21:55, Luke Cwik <lc...@google.com> wrote: > The issue is that the encoding that is passed between transforms needs to > store the metadata of what was in each column when the data is read as it > is passed around in the pipeline. Imagine that column X was a string, was > then deleted, and then re-added as a datetime. These kinds of schema > evolutions typically have business specific rules as to what to do. > > I believe there was a user that wrote a custom coder that encoded this > extra information with each row and wrote a generic BigQuery reader or > writer(don't remember which) that could do something as you wanted with > limitations around schema evolution and at the performance cost of passing > around the metadata but I don't believe this was contributed back to the > community. > > Try searching through the dev[1]/user[2] e-mail archives. > > 1: https://lists.apache.org/list.html?d...@beam.apache.org:lte=99M > 2: https://lists.apache.org/list.html?user@beam.apache.org:lte=99M > > On Sun, Aug 1, 2021 at 12:06 PM Rajnil Guha <rajnil94.g...@gmail.com> > wrote: > >> Hi Beam Users, >> >> Our pipeline is reading avro files from GCS into Dataflow and writing >> them into Big Query tables . I am using the WriteToBigQuery transform to >> write my Pcoll contents into Big Query. >> My avro file contains about 150-200 fields. We have tested our pipeline >> by providing the field information for all the fields in the TableSchema >> object within the pipeline code. So every time there is a change in schema >> or the schema evolves we need to change our pipeline code. >> I was wondering if there is any way to provide the BigQuery table schema >> information outside the pipeline code and infer into the pipeline from >> there as it's much easier to maintain that way. >> >> Note:- We are using the Python SDK to write our pipelines and running on >> Dataflow. >> >> Thanks & Regards >> Rajnil Guha >> >> >> >> >> >> >>