vchiapaikeo opened a new pull request, #28796:
URL: https://github.com/apache/airflow/pull/28796

   <!--
   Thank you for contributing! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   In case of an existing issue, reference it using one of the following:
   
   closes: #ISSUE
   related: #ISSUE
   
   How to write a good git commit message:
   http://chris.beams.io/posts/git-commit/
   -->
   
   A TypeError which causes tasks to fail at runtime exists in the 
BigQueryColumnCheckOperator. This was initially uncovered while looking at 
another issue for this operator here: 
https://github.com/apache/airflow/issues/28343#issuecomment-1374350497
   
   This fixes the operator by calling the list's extend() method instead of 
calling the list itself. Also added a few tests.
   
   As an aside, I had to SKIP=run-mypy during my commit because I ran into this 
unusual pre-commit failure which doesn't seem relevant:
   
   ```
   Run mypy for 
providers.................................................................Failed
   - hook id: run-mypy
   - exit code: 1
   
   airflow/providers/google/cloud/operators/bigquery.py:250: error:
   "BigQueryCheckOperator" has no attribute "_raise_exception"  [attr-defined]
                   self._raise_exception(f"Test failed.\nQuery:\n{self.sql}\n...
                   ^
   Found 1 error in 1 file (checked 1 source file)
   If you see strange stacktraces above, run `breeze ci-image build --python 
3.7` and try again.
   
   ```
   
   ## Test Dag
   
   ```py
   from airflow import DAG
   
   from airflow.providers.google.cloud.operators.bigquery import 
BigQueryColumnCheckOperator
   
   DEFAULT_TASK_ARGS = {
       "owner": "gcp-data-platform",
       "retries": 1,
       "retry_delay": 10,
       "start_date": "2022-08-01",
   }
   
   with DAG(
       max_active_runs=1,
       concurrency=2,
       catchup=False,
       schedule_interval="@daily",
       dag_id="test_bigquery_column_check",
       default_args=DEFAULT_TASK_ARGS,
   ) as dag:
   
       basic_column_quality_checks = BigQueryColumnCheckOperator(
               task_id="check_columns",
               table="my-project.vchiapaikeo.test1",
               use_legacy_sql=False,
               column_mapping={
                   "col1": {"min": {"greater_than": 0}},
               },
           )
   ```
   
   <img width="974" alt="image" 
src="https://user-images.githubusercontent.com/9200263/211229519-96a9f439-ffe4-4ddc-bf07-84e5b73bb45d.png";>
   
   
   Task Logs:
   
   ```
   686f5b14989d
   *** Reading local file: 
/root/airflow/logs/dag_id=test_bigquery_column_check/run_id=scheduled__2023-01-08T00:00:00+00:00/task_id=check_columns/attempt=3.log
   [2023-01-09, 01:40:19 UTC] {taskinstance.py:1093} INFO - Dependencies all 
met for <TaskInstance: test_bigquery_column_check.check_columns 
scheduled__2023-01-08T00:00:00+00:00 [queued]>
   [2023-01-09, 01:40:19 UTC] {taskinstance.py:1093} INFO - Dependencies all 
met for <TaskInstance: test_bigquery_column_check.check_columns 
scheduled__2023-01-08T00:00:00+00:00 [queued]>
   [2023-01-09, 01:40:19 UTC] {taskinstance.py:1295} INFO - 
   
--------------------------------------------------------------------------------
   [2023-01-09, 01:40:19 UTC] {taskinstance.py:1296} INFO - Starting attempt 3 
of 4
   [2023-01-09, 01:40:19 UTC] {taskinstance.py:1297} INFO - 
   
--------------------------------------------------------------------------------
   [2023-01-09, 01:40:19 UTC] {taskinstance.py:1316} INFO - Executing 
<Task(BigQueryColumnCheckOperator): check_columns> on 2023-01-08 00:00:00+00:00
   [2023-01-09, 01:40:19 UTC] {standard_task_runner.py:55} INFO - Started 
process 481 to run task
   [2023-01-09, 01:40:20 UTC] {standard_task_runner.py:82} INFO - Running: 
['***', 'tasks', 'run', 'test_bigquery_column_check', 'check_columns', 
'scheduled__2023-01-08T00:00:00+00:00', '--job-id', '5', '--raw', '--subdir', 
'DAGS_FOLDER/test_bigquery_column_check.py', '--cfg-path', '/tmp/tmpgeqtp2hz']
   [2023-01-09, 01:40:20 UTC] {standard_task_runner.py:83} INFO - Job 5: 
Subtask check_columns
   [2023-01-09, 01:40:21 UTC] {task_command.py:391} INFO - Running 
<TaskInstance: test_bigquery_column_check.check_columns 
scheduled__2023-01-08T00:00:00+00:00 [running]> on host 686f5b14989d
   [2023-01-09, 01:40:21 UTC] {taskinstance.py:1525} INFO - Exporting the 
following env vars:
   AIRFLOW_CTX_DAG_OWNER=gcp-data-platform
   AIRFLOW_CTX_DAG_ID=test_bigquery_column_check
   AIRFLOW_CTX_TASK_ID=check_columns
   AIRFLOW_CTX_EXECUTION_DATE=2023-01-08T00:00:00+00:00
   AIRFLOW_CTX_TRY_NUMBER=3
   AIRFLOW_CTX_DAG_RUN_ID=scheduled__2023-01-08T00:00:00+00:00
   [2023-01-09, 01:40:21 UTC] {base.py:73} INFO - Using connection ID 
'google_cloud_default' for task execution.
   [2023-01-09, 01:40:21 UTC] {credentials_provider.py:323} INFO - Getting 
connection using `google.auth.default()` since no key file is defined for hook.
   [2023-01-09, 01:40:21 UTC] {_default.py:649} WARNING - No project ID could 
be determined. Consider running `gcloud config set project` or setting the 
GOOGLE_CLOUD_PROJECT environment variable
   [2023-01-09, 01:40:21 UTC] {bigquery.py:1539} INFO - Inserting job 
***_1673228421636668_2d2a9b688dcd63bef1c449cd8b764f86
   [2023-01-09, 01:40:23 UTC] {bigquery.py:601} INFO - Record:   col_name 
check_type  check_result
   0     col1        min             2
   [2023-01-09, 01:40:23 UTC] {bigquery.py:628} INFO - All tests have passed
   [2023-01-09, 01:40:23 UTC] {taskinstance.py:1339} INFO - Marking task as 
SUCCESS. dag_id=test_bigquery_column_check, task_id=check_columns, 
execution_date=20230108T000000, start_date=20230109T014019, 
end_date=20230109T014023
   [2023-01-09, 01:40:23 UTC] {local_task_job.py:211} INFO - Task exited with 
return code 0
   [2023-01-09, 01:40:23 UTC] {taskinstance.py:2613} INFO - 0 downstream tasks 
scheduled from follow-on schedule check
   ```
   
   cc: @eladkal , @VladaZakharova , @denimalpaca 
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request 
Guidelines](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#pull-request-guidelines)**
 for more information.
   In case of fundamental code changes, an Airflow Improvement Proposal 
([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvement+Proposals))
 is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party 
License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in a 
newsfragment file, named `{pr_number}.significant.rst` or 
`{issue_number}.significant.rst`, in 
[newsfragments](https://github.com/apache/airflow/tree/main/newsfragments).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to