Apache Beam BigQueryIO Exception

Rajnil Guha Sat, 19 Feb 2022 04:07:04 -0800

Hi Beam Users,

We have a Dataflow pipeline which reads and writes data from and into
BigQuery. The basic structure of the pipeline is as follows:


query = <<select a bunch of columns from project_1.datasetId.tableId>>
with beam.Pipeline(options = options) as p:
     read_bq_records = (p | "ReadFromBQ" >> beam.io.ReadFromBigQuery(

                                             query = query,

                                             use_standard_sql = True

                                             )
    )
.
.
.
<<do some transformations>>
.
.
.
   write_bq_records = (previous_pcol |
beam.io.WriteToBigQuery(project_2:datasetId.tableId,

                           some_schema,

                           create_disposition =
beam.io.BigQueryDisposition.CREATE_IF_NEEDED,

                           write_disposition =
beam.io.BigQueryDisposition.WRITE_APPEND

                           )
    )

Our pipeline fails with a message that it's not able to create a temp
dataset in project_2(destination project) because the service account we
are using to run Dataflow jobs doesn't have "bigquery.dataset.create"
permission assigned to it for project_2. We have tried another pipeline
which reads data from GCS and writes to BQ, it works fine. So the issue
seems to be due to reading data from BQ.
I am not exactly aware of the internal working of the BigQueryIO but from
my initial understanding I assume it tries to snapshot the source table
into a temp dataset/table in the destination project from where it writes
data into the destination table.
It would be very helpful if someone aware or has faced similar exception
can shed some light into the behaviour of BigQueryIO.

Thanks & Regards
Rajnil

Apache Beam BigQueryIO Exception

Reply via email to