vchiapaikeo opened a new pull request, #28796: URL: https://github.com/apache/airflow/pull/28796
<!-- Thank you for contributing! Please make sure that your code changes are covered with tests. And in case of new features or big changes remember to adjust the documentation. Feel free to ping committers for the review! In case of an existing issue, reference it using one of the following: closes: #ISSUE related: #ISSUE How to write a good git commit message: http://chris.beams.io/posts/git-commit/ --> A TypeError which causes tasks to fail at runtime exists in the BigQueryColumnCheckOperator. This was initially uncovered while looking at another issue for this operator here: https://github.com/apache/airflow/issues/28343#issuecomment-1374350497 This fixes the operator by calling the list's extend() method instead of calling the list itself. Also added a few tests. As an aside, I had to SKIP=run-mypy during my commit because I ran into this unusual pre-commit failure which doesn't seem relevant: ``` Run mypy for providers.................................................................Failed - hook id: run-mypy - exit code: 1 airflow/providers/google/cloud/operators/bigquery.py:250: error: "BigQueryCheckOperator" has no attribute "_raise_exception" [attr-defined] self._raise_exception(f"Test failed.\nQuery:\n{self.sql}\n... ^ Found 1 error in 1 file (checked 1 source file) If you see strange stacktraces above, run `breeze ci-image build --python 3.7` and try again. ``` ## Test Dag ```py from airflow import DAG from airflow.providers.google.cloud.operators.bigquery import BigQueryColumnCheckOperator DEFAULT_TASK_ARGS = { "owner": "gcp-data-platform", "retries": 1, "retry_delay": 10, "start_date": "2022-08-01", } with DAG( max_active_runs=1, concurrency=2, catchup=False, schedule_interval="@daily", dag_id="test_bigquery_column_check", default_args=DEFAULT_TASK_ARGS, ) as dag: basic_column_quality_checks = BigQueryColumnCheckOperator( task_id="check_columns", table="my-project.vchiapaikeo.test1", use_legacy_sql=False, column_mapping={ "col1": {"min": {"greater_than": 0}}, }, ) ``` <img width="974" alt="image" src="https://user-images.githubusercontent.com/9200263/211229519-96a9f439-ffe4-4ddc-bf07-84e5b73bb45d.png"> Task Logs: ``` 686f5b14989d *** Reading local file: /root/airflow/logs/dag_id=test_bigquery_column_check/run_id=scheduled__2023-01-08T00:00:00+00:00/task_id=check_columns/attempt=3.log [2023-01-09, 01:40:19 UTC] {taskinstance.py:1093} INFO - Dependencies all met for <TaskInstance: test_bigquery_column_check.check_columns scheduled__2023-01-08T00:00:00+00:00 [queued]> [2023-01-09, 01:40:19 UTC] {taskinstance.py:1093} INFO - Dependencies all met for <TaskInstance: test_bigquery_column_check.check_columns scheduled__2023-01-08T00:00:00+00:00 [queued]> [2023-01-09, 01:40:19 UTC] {taskinstance.py:1295} INFO - -------------------------------------------------------------------------------- [2023-01-09, 01:40:19 UTC] {taskinstance.py:1296} INFO - Starting attempt 3 of 4 [2023-01-09, 01:40:19 UTC] {taskinstance.py:1297} INFO - -------------------------------------------------------------------------------- [2023-01-09, 01:40:19 UTC] {taskinstance.py:1316} INFO - Executing <Task(BigQueryColumnCheckOperator): check_columns> on 2023-01-08 00:00:00+00:00 [2023-01-09, 01:40:19 UTC] {standard_task_runner.py:55} INFO - Started process 481 to run task [2023-01-09, 01:40:20 UTC] {standard_task_runner.py:82} INFO - Running: ['***', 'tasks', 'run', 'test_bigquery_column_check', 'check_columns', 'scheduled__2023-01-08T00:00:00+00:00', '--job-id', '5', '--raw', '--subdir', 'DAGS_FOLDER/test_bigquery_column_check.py', '--cfg-path', '/tmp/tmpgeqtp2hz'] [2023-01-09, 01:40:20 UTC] {standard_task_runner.py:83} INFO - Job 5: Subtask check_columns [2023-01-09, 01:40:21 UTC] {task_command.py:391} INFO - Running <TaskInstance: test_bigquery_column_check.check_columns scheduled__2023-01-08T00:00:00+00:00 [running]> on host 686f5b14989d [2023-01-09, 01:40:21 UTC] {taskinstance.py:1525} INFO - Exporting the following env vars: AIRFLOW_CTX_DAG_OWNER=gcp-data-platform AIRFLOW_CTX_DAG_ID=test_bigquery_column_check AIRFLOW_CTX_TASK_ID=check_columns AIRFLOW_CTX_EXECUTION_DATE=2023-01-08T00:00:00+00:00 AIRFLOW_CTX_TRY_NUMBER=3 AIRFLOW_CTX_DAG_RUN_ID=scheduled__2023-01-08T00:00:00+00:00 [2023-01-09, 01:40:21 UTC] {base.py:73} INFO - Using connection ID 'google_cloud_default' for task execution. [2023-01-09, 01:40:21 UTC] {credentials_provider.py:323} INFO - Getting connection using `google.auth.default()` since no key file is defined for hook. [2023-01-09, 01:40:21 UTC] {_default.py:649} WARNING - No project ID could be determined. Consider running `gcloud config set project` or setting the GOOGLE_CLOUD_PROJECT environment variable [2023-01-09, 01:40:21 UTC] {bigquery.py:1539} INFO - Inserting job ***_1673228421636668_2d2a9b688dcd63bef1c449cd8b764f86 [2023-01-09, 01:40:23 UTC] {bigquery.py:601} INFO - Record: col_name check_type check_result 0 col1 min 2 [2023-01-09, 01:40:23 UTC] {bigquery.py:628} INFO - All tests have passed [2023-01-09, 01:40:23 UTC] {taskinstance.py:1339} INFO - Marking task as SUCCESS. dag_id=test_bigquery_column_check, task_id=check_columns, execution_date=20230108T000000, start_date=20230109T014019, end_date=20230109T014023 [2023-01-09, 01:40:23 UTC] {local_task_job.py:211} INFO - Task exited with return code 0 [2023-01-09, 01:40:23 UTC] {taskinstance.py:2613} INFO - 0 downstream tasks scheduled from follow-on schedule check ``` cc: @eladkal , @VladaZakharova , @denimalpaca --- **^ Add meaningful description above** Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#pull-request-guidelines)** for more information. In case of fundamental code changes, an Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvement+Proposals)) is needed. In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x). In case of backwards incompatible changes please leave a note in a newsfragment file, named `{pr_number}.significant.rst` or `{issue_number}.significant.rst`, in [newsfragments](https://github.com/apache/airflow/tree/main/newsfragments). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org