Hi Beam Users, We have a Dataflow pipeline which reads and writes data from and into BigQuery. The basic structure of the pipeline is as follows:
query = <<select a bunch of columns from project_1.datasetId.tableId>> with beam.Pipeline(options = options) as p: read_bq_records = (p | "ReadFromBQ" >> beam.io.ReadFromBigQuery( query = query, use_standard_sql = True ) ) . . . <<do some transformations>> . . . write_bq_records = (previous_pcol | beam.io.WriteToBigQuery(project_2:datasetId.tableId, some_schema, create_disposition = beam.io.BigQueryDisposition.CREATE_IF_NEEDED, write_disposition = beam.io.BigQueryDisposition.WRITE_APPEND ) ) Our pipeline fails with a message that it's not able to create a temp dataset in project_2(destination project) because the service account we are using to run Dataflow jobs doesn't have "bigquery.dataset.create" permission assigned to it for project_2. We have tried another pipeline which reads data from GCS and writes to BQ, it works fine. So the issue seems to be due to reading data from BQ. I am not exactly aware of the internal working of the BigQueryIO but from my initial understanding I assume it tries to snapshot the source table into a temp dataset/table in the destination project from where it writes data into the destination table. It would be very helpful if someone aware or has faced similar exception can shed some light into the behaviour of BigQueryIO. Thanks & Regards Rajnil