Graham Polley created BEAM-6910:
-----------------------------------

             Summary: Beam does not consider BigQuery's processing location 
when getting query results
                 Key: BEAM-6910
                 URL: https://issues.apache.org/jira/browse/BEAM-6910
             Project: Beam
          Issue Type: Bug
          Components: dependencies, runner-dataflow, sdk-py-core
    Affects Versions: 2.11.0
         Environment: Python
            Reporter: Graham Polley


When using the BigQuery source with a query in a pipeline, the "processing 
location" is not taken into consideration and the pipeline fails.

For example, consider the following which uses `BigQuerySource` to read from 
BigQuery using some SQL. The BigQuery dataset and tables are located in 
"australia-southeast1". The query is submitted successfully ([Beam works out 
the processing location by examining the first table referenced in the query 
and sets it 
accordingly|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L221]),
 but when Beam attempts to poll for the job status after it has been submitted, 
it fails because it doesn't set the `location` to be "australia-southeast1", 
which is required by BigQuery:

 

 
{code:java}
p | 'read' >> beam.io.Read(beam.io.BigQuerySource(use_standard_sql=True, 
query='SELECT * from 
`a_project_id.dataset_in_australia.table_in_australia`'){code}
 

 

 
{code:java}
HttpNotFoundError: HttpError accessing 
<https://www.googleapis.com/bigquery/v2/projects/a_project_id/queries/5ad9cc803baa432290b6cd0203f556d9?alt=json&maxResults=10000>:
 response: <{'status': '404', 'content-length': '328', 'x-xss-protection': '1; 
mode=block', 'x-content-type-options': 'nosniff', 'transfer-encoding': 
'chunked', 'vary': 'Origin, X-Origin, Referer', 'server': 'ESF', 
'-content-encoding': 'gzip', 'cache-control': 'private', 'date': 'Tue, 26 Mar 
2019 03:11:32 GMT', 'x-frame-options': 'SAMEORIGIN', 'alt-svc': 'quic=":443"; 
ma=2592000; v="46,44,43,39"', 'content-type': 'application/json; 
charset=UTF-8'}>, content <{
  "error": {
    "code": 404,
    "message": "Not found: Job a_project_id:5ad9cc803baa432290b6cd0203f556d9",
    "errors": [
      {
        "message": "Not found: Job 
a_project_id:5ad9cc803baa432290b6cd0203f556d9",
        "domain": "global",
        "reason": "notFound"
      }
    ],
    "status": "NOT_FOUND"
  }
}
{code}
 

The problem can be seen here:

[https://github.com/apache/beam/blob/v2.11.0/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L571]

[https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L357]

The location of the job (in this case "australia-southeast1") needs to 
set/inferred (or exposed via the API), otherwise its fails.

 

For reference, Airflow had the same bug/problem: 
https://github.com/apache/airflow/pull/4695

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to