[GitHub] [airflow] jedcunningham commented on a diff in pull request #32099: Support setting dependencies for tasks called outside setup/teardown context manager
jedcunningham commented on code in PR #32099: URL: https://github.com/apache/airflow/pull/32099#discussion_r1240592248 ## airflow/models/baseoperator.py: ## @@ -1001,6 +1001,14 @@ def __enter__(self): def __exit__(self, exc_type, exc_val, exc_tb): SetupTeardownContext.set_work_task_roots_and_leaves() +@staticmethod +def add_ctx_task(task): Review Comment: I'm a little torn on this name. Maybe `add_task_to_ctx`? Are we planning on doing the same thing for adding tasks to a TaskGroup? Can we check that this isn't being used outside a context manager? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] amoghrajesh commented on pull request #31376: Add valid spark on k8s airflow connection urls to tests
amoghrajesh commented on PR #31376: URL: https://github.com/apache/airflow/pull/31376#issuecomment-1605244487 @hussein-awala can we merge this PR too as the dependent PR has been merged recently? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] vandonr-amz commented on a diff in pull request #32112: deferrable mode for `SageMakerTuningOperator`
vandonr-amz commented on code in PR #32112: URL: https://github.com/apache/airflow/pull/32112#discussion_r1240552004 ## tests/system/providers/amazon/aws/example_sagemaker.py: ## @@ -159,12 +159,11 @@ def _build_and_upload_docker_image(preprocess_script, repository_uri): docker_build_and_push_commands = f""" cp /root/.aws/credentials /tmp/credentials && # login to public ecr repo containing amazonlinux image -docker login --username {creds.username} --password {creds.password} public.ecr.aws +docker login --username {creds.username} --password {creds.password} public.ecr.aws && Review Comment: the old version was working too, but at least this homogeneous with the other lines of the command ## tests/system/providers/amazon/aws/example_sagemaker.py: ## @@ -178,7 +177,8 @@ def _build_and_upload_docker_image(preprocess_script, repository_uri): if docker_build.returncode != 0: raise RuntimeError( "Failed to prepare docker image for the preprocessing job.\n" -f"The following error happened while executing the sequence of bash commands:\n{stderr}" +"The following error happened while executing the sequence of bash commands:\n" +f"{stderr.decode()}" Review Comment: stderr is a byte array. Without the `decode()`, the `\n` are not escaped for instance, and the whole output appears on one line, which is _not the best_ ## tests/system/providers/amazon/aws/example_sagemaker.py: ## @@ -159,12 +159,11 @@ def _build_and_upload_docker_image(preprocess_script, repository_uri): docker_build_and_push_commands = f""" cp /root/.aws/credentials /tmp/credentials && # login to public ecr repo containing amazonlinux image -docker login --username {creds.username} --password {creds.password} public.ecr.aws +docker login --username {creds.username} --password {creds.password} public.ecr.aws && docker build --platform=linux/amd64 -f {dockerfile.name} -t {repository_uri} /tmp && rm /tmp/credentials && # login again, this time to the private repo we created to hold that specific image -aws ecr get-login-password --region {ecr_region} | Review Comment: this is probably from before, when we didn't pass credentials in the command line, but it's now useless. ## airflow/providers/amazon/aws/triggers/sagemaker.py: ## @@ -41,7 +42,7 @@ def __init__( job_name: str, job_type: str, poke_interval: int = 30, -max_attempts: int | None = None, +max_attempts: int = 480, Review Comment: passing None was not working before, so this is not a breaking change since it was already broken. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] boring-cyborg[bot] commented on pull request #32113: Fix KubernetesPodOperator validate xcom json and add retries
boring-cyborg[bot] commented on PR #32113: URL: https://github.com/apache/airflow/pull/32113#issuecomment-1605198948 Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst) Here are some useful points: - Pay attention to the quality of your code (ruff, mypy and type annotations). Our [pre-commits]( https://github.com/apache/airflow/blob/main/STATIC_CODE_CHECKS.rst#prerequisites-for-pre-commit-hooks) will help you with that. - In case of a new feature add useful documentation (in docstrings or in `docs/` directory). Adding a new operator? Check this short [guide](https://github.com/apache/airflow/blob/main/docs/apache-airflow/howto/custom-operator.rst) Consider adding an example DAG that shows how users should use it. - Consider using [Breeze environment](https://github.com/apache/airflow/blob/main/BREEZE.rst) for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations. - Be patient and persistent. It might take some time to get a review or get the final approval from Committers. - Please follow [ASF Code of Conduct](https://www.apache.org/foundation/policies/conduct) for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack. - Be sure to read the [Airflow Coding style]( https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#coding-style-and-best-practices). Apache Airflow is a community-driven project and together we are making it better . In case of doubts contact the developers at: Mailing List: d...@airflow.apache.org Slack: https://s.apache.org/airflow-slack -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] aagateuip opened a new pull request, #32113: Fix KubernetesPodOperator validate xcom json and add retries
aagateuip opened a new pull request, #32113: URL: https://github.com/apache/airflow/pull/32113 --- * Added retries to extract_xcom to guard against intermittent network connectivity failures. * xcom json is validated to make sure entire json was retrieved. * xcom sidecar is killed only if xcom json that was retrieved was valid. closes: #32111 **^ Add meaningful description above** Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#pull-request-guidelines)** for more information. In case of fundamental code changes, an Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvement+Proposals)) is needed. In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x). In case of backwards incompatible changes please leave a note in a newsfragment file, named `{pr_number}.significant.rst` or `{issue_number}.significant.rst`, in [newsfragments](https://github.com/apache/airflow/tree/main/newsfragments). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] vandonr-amz opened a new pull request, #32112: deferrable mode for `SageMakerTuningOperator`
vandonr-amz opened a new pull request, #32112: URL: https://github.com/apache/airflow/pull/32112 took the opportunity to fix tiny things in the system test, and also migrate to the trigger to the method that logs the status while it waits -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] github-actions[bot] closed pull request #31040: Added kube-prometheus-stack servicemonitor to gather metrics from Airflow statsd exporter
github-actions[bot] closed pull request #31040: Added kube-prometheus-stack servicemonitor to gather metrics from Airflow statsd exporter URL: https://github.com/apache/airflow/pull/31040 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] github-actions[bot] commented on pull request #30581: Make pause DAG its own role separate from edit DAG
github-actions[bot] commented on PR #30581: URL: https://github.com/apache/airflow/pull/30581#issuecomment-1605178177 This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[airflow] branch constraints-main updated: Updating constraints. Github run id:5361110862
This is an automated email from the ASF dual-hosted git repository. github-bot pushed a commit to branch constraints-main in repository https://gitbox.apache.org/repos/asf/airflow.git The following commit(s) were added to refs/heads/constraints-main by this push: new b0a5ba8486 Updating constraints. Github run id:5361110862 b0a5ba8486 is described below commit b0a5ba848667f6b05909ea55c18bea0614323d88 Author: Automated GitHub Actions commit AuthorDate: Fri Jun 23 23:48:04 2023 + Updating constraints. Github run id:5361110862 This update in constraints is automatically committed by the CI 'constraints-push' step based on 'refs/heads/main' in the 'apache/airflow' repository with commit sha b28c90354f110bd598ddce193cf82cb1416adbc8. The action that build those constraints can be found at https://github.com/apache/airflow/actions/runs/5361110862/ The image tag used for that build was: b28c90354f110bd598ddce193cf82cb1416adbc8. You can enter Breeze environment with this image by running 'breeze shell --image-tag b28c90354f110bd598ddce193cf82cb1416adbc8' All tests passed in this build so we determined we can push the updated constraints. See https://github.com/apache/airflow/blob/main/README.md#installing-from-pypi for details. --- constraints-3.10.txt | 2 +- constraints-3.11.txt | 2 +- constraints-3.8.txt | 2 +- constraints-3.9.txt | 2 +- constraints-no-providers-3.10.txt | 2 +- constraints-no-providers-3.11.txt | 2 +- constraints-no-providers-3.8.txt | 2 +- constraints-no-providers-3.9.txt | 2 +- constraints-source-providers-3.10.txt | 2 +- constraints-source-providers-3.11.txt | 4 +--- constraints-source-providers-3.8.txt | 2 +- constraints-source-providers-3.9.txt | 2 +- 12 files changed, 12 insertions(+), 14 deletions(-) diff --git a/constraints-3.10.txt b/constraints-3.10.txt index cca9ccbd63..c9b016b598 100644 --- a/constraints-3.10.txt +++ b/constraints-3.10.txt @@ -1,5 +1,5 @@ # -# This constraints file was automatically generated on 2023-06-23T22:25:08Z +# This constraints file was automatically generated on 2023-06-23T23:43:47Z # via "eager-upgrade" mechanism of PIP. For the "main" branch of Airflow. # This variant of constraints install uses the HEAD of the branch version for 'apache-airflow' but installs # the providers from PIP-released packages at the moment of the constraint generation. diff --git a/constraints-3.11.txt b/constraints-3.11.txt index c480e87702..ca4dec2b5d 100644 --- a/constraints-3.11.txt +++ b/constraints-3.11.txt @@ -1,5 +1,5 @@ # -# This constraints file was automatically generated on 2023-06-23T22:29:20Z +# This constraints file was automatically generated on 2023-06-23T23:48:02Z # via "eager-upgrade" mechanism of PIP. For the "main" branch of Airflow. # This variant of constraints install uses the HEAD of the branch version for 'apache-airflow' but installs # the providers from PIP-released packages at the moment of the constraint generation. diff --git a/constraints-3.8.txt b/constraints-3.8.txt index 945e5a1ce1..5f76d6a34e 100644 --- a/constraints-3.8.txt +++ b/constraints-3.8.txt @@ -1,5 +1,5 @@ # -# This constraints file was automatically generated on 2023-06-23T22:25:27Z +# This constraints file was automatically generated on 2023-06-23T23:44:10Z # via "eager-upgrade" mechanism of PIP. For the "main" branch of Airflow. # This variant of constraints install uses the HEAD of the branch version for 'apache-airflow' but installs # the providers from PIP-released packages at the moment of the constraint generation. diff --git a/constraints-3.9.txt b/constraints-3.9.txt index c97ebdb51d..9515f25efb 100644 --- a/constraints-3.9.txt +++ b/constraints-3.9.txt @@ -1,5 +1,5 @@ # -# This constraints file was automatically generated on 2023-06-23T22:25:29Z +# This constraints file was automatically generated on 2023-06-23T23:44:13Z # via "eager-upgrade" mechanism of PIP. For the "main" branch of Airflow. # This variant of constraints install uses the HEAD of the branch version for 'apache-airflow' but installs # the providers from PIP-released packages at the moment of the constraint generation. diff --git a/constraints-no-providers-3.10.txt b/constraints-no-providers-3.10.txt index fe4802d4cc..3ff0c1fe9e 100644 --- a/constraints-no-providers-3.10.txt +++ b/constraints-no-providers-3.10.txt @@ -1,5 +1,5 @@ # -# This constraints file was automatically generated on 2023-06-23T22:21:45Z +# This constraints file was automatically generated on 2023-06-23T23:40:24Z # via "eager-upgrade" mechanism of PIP. For the "main" branch of Airflow. # This variant of constraints install just the 'bare' 'apache-airflow' package build from the HEAD of # the branch, without installing any of the providers. diff --git a/constraints-no-providers-3.11.txt b/constraints-no-providers-3.11.txt index
[GitHub] [airflow] ITJamie commented on a diff in pull request #30531: [WIP] Migrate core `EmailOperator` to SMTP provider
ITJamie commented on code in PR #30531: URL: https://github.com/apache/airflow/pull/30531#discussion_r1240510568 ## airflow/utils/email.py: ## @@ -19,22 +19,69 @@ import collections.abc import logging -import os import re -import smtplib import warnings -from email.mime.application import MIMEApplication -from email.mime.multipart import MIMEMultipart -from email.mime.text import MIMEText -from email.utils import formatdate from typing import Any, Iterable from airflow.configuration import conf -from airflow.exceptions import AirflowConfigException, AirflowException, RemovedInAirflow3Warning +from airflow.exceptions import AirflowConfigException, RemovedInAirflow3Warning +from airflow.models import Connection +from airflow.providers.smtp.hooks.smtp import SmtpHook log = logging.getLogger(__name__) +class _SmtpHook(SmtpHook): +@classmethod +def get_connection(cls, conn_id: str) -> Connection: +try: +connection = super().get_connection(conn_id) +except Exception: +connection = Connection() + +extra = connection.extra_dejson + +# try to load some variables from Airflow config to update connection extra +from_email = conf.get("smtp", "SMTP_MAIL_FROM") or conf.get("email", "from_email", fallback=None) +if from_email: +extra["from_email"] = from_email + +smtp_host = conf.get("smtp", "SMTP_HOST", fallback=None) +if smtp_host: +connection.host = smtp_host +smtp_port = conf.getint("smtp", "SMTP_PORT", fallback=None) +if smtp_port: +connection.port = smtp_port +smtp_starttls = conf.getboolean("smtp", "SMTP_STARTTLS", fallback=None) +if smtp_starttls is not None: +extra["disable_tls"] = not smtp_starttls +smtp_ssl = conf.getboolean("smtp", "SMTP_SSL", fallback=None) +if smtp_ssl is not None: +extra["disable_ssl"] = not smtp_ssl +smtp_retry_limit = conf.getint("smtp", "SMTP_RETRY_LIMIT", fallback=None) +if smtp_retry_limit is not None: +extra["retry_limit"] = smtp_retry_limit +smtp_timeout = conf.getint("smtp", "SMTP_TIMEOUT", fallback=None) +if smtp_timeout is not None: +extra["timeout"] = smtp_timeout + +# for credentials, we use the connection if it exists, otherwise we use the config +if connection.login is None or connection.password is None: +warnings.warn( Review Comment: This warning assumes incorrectly that it has to fallback to cfg settings to find a username or password for smtp. Not all smtp servers require authentication to send mail. It can not be assumed that if a connection has no username or password that the connection config is wrong and thus falling back to legacy config -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] getaaron commented on issue #31105: airflow-webserver crashes if log files are larger than webserver available RAM
getaaron commented on issue #31105: URL: https://github.com/apache/airflow/issues/31105#issuecomment-1605119355 > Yeah, I thought of a way where we could maintain separate log_pos for different streams. This way we do not have to pull the whole file into memory just to get the final few lines. But I have some reservations about this with regard to whether the log order would change. I will look into this. If the separate log files are already sorted (I assume they are) then you can use a k-way merge to produce a sorted combined list without loading the whole thing into memory: https://en.m.wikipedia.org/wiki/K-way_merge_algorithm If they're not already sorted (unlikely) then you could sort them individually, then use the k-way merge sort. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on pull request #32110: add deferrable mode for `AthenaOperator`
potiuk commented on PR #32110: URL: https://github.com/apache/airflow/pull/32110#issuecomment-1605084324 static checks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on pull request #31920: Deprecate the 2 non-official elasticsearch libraries
potiuk commented on PR #31920: URL: https://github.com/apache/airflow/pull/31920#issuecomment-1605076347 Woooho! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk merged pull request #31920: Deprecate the 2 non-official elasticsearch libraries
potiuk merged PR #31920: URL: https://github.com/apache/airflow/pull/31920 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[airflow] branch main updated: Deprecate the 2 non-official elasticsearch libraries (#31920)
This is an automated email from the ASF dual-hosted git repository. potiuk pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/airflow.git The following commit(s) were added to refs/heads/main by this push: new b28c90354f Deprecate the 2 non-official elasticsearch libraries (#31920) b28c90354f is described below commit b28c90354f110bd598ddce193cf82cb1416adbc8 Author: Owen Leung AuthorDate: Sat Jun 24 06:44:51 2023 +0800 Deprecate the 2 non-official elasticsearch libraries (#31920) - Co-authored-by: eladkal <45845474+elad...@users.noreply.github.com> --- airflow/providers/elasticsearch/CHANGELOG.rst | 6 ++ .../providers/elasticsearch/hooks/elasticsearch.py | 39 ++- .../providers/elasticsearch/log/es_task_handler.py | 116 - airflow/providers/elasticsearch/provider.yaml | 8 +- docker_tests/test_prod_image.py| 2 +- .../index.rst | 2 - docs/spelling_wordlist.txt | 1 + generated/provider_dependencies.json | 2 - tests/models/test_dag.py | 4 +- .../elasticsearch/log/test_es_task_handler.py | 6 +- 10 files changed, 143 insertions(+), 43 deletions(-) diff --git a/airflow/providers/elasticsearch/CHANGELOG.rst b/airflow/providers/elasticsearch/CHANGELOG.rst index 0b281154c6..e0f2aa0894 100644 --- a/airflow/providers/elasticsearch/CHANGELOG.rst +++ b/airflow/providers/elasticsearch/CHANGELOG.rst @@ -24,6 +24,12 @@ Changelog - +5.0.0 +. + +.. note:: + Deprecate non-official elasticsearch libraries. Only the official elasticsearch library was used + 4.5.1 . diff --git a/airflow/providers/elasticsearch/hooks/elasticsearch.py b/airflow/providers/elasticsearch/hooks/elasticsearch.py index 1a008617bd..bc55762a15 100644 --- a/airflow/providers/elasticsearch/hooks/elasticsearch.py +++ b/airflow/providers/elasticsearch/hooks/elasticsearch.py @@ -20,9 +20,9 @@ from __future__ import annotations import warnings from functools import cached_property from typing import Any +from urllib import parse from elasticsearch import Elasticsearch -from es.elastic.api import Connection as ESConnection, connect from airflow.exceptions import AirflowProviderDeprecationWarning from airflow.hooks.base import BaseHook @@ -30,6 +30,43 @@ from airflow.models.connection import Connection as AirflowConnection from airflow.providers.common.sql.hooks.sql import DbApiHook +def connect( +host: str = "localhost", +port: int = 9200, +user: str | None = None, +password: str | None = None, +scheme: str = "http", +**kwargs: Any, +) -> ESConnection: +return ESConnection(host, port, user, password, scheme, **kwargs) + + +class ESConnection: +"""wrapper class for elasticsearch.Elasticsearch.""" + +def __init__( +self, +host: str = "localhost", +port: int = 9200, +user: str | None = None, +password: str | None = None, +scheme: str = "http", +**kwargs: Any, +): +self.host = host +self.port = port +self.user = user +self.password = password +self.scheme = scheme +self.kwargs = kwargs +netloc = f"{host}:{port}" +self.url = parse.urlunparse((scheme, netloc, "/", None, None, None)) +if user and password: +self.es = Elasticsearch(self.url, http_auth=(user, password), **self.kwargs) +else: +self.es = Elasticsearch(self.url, **self.kwargs) + + class ElasticsearchSQLHook(DbApiHook): """ Interact with Elasticsearch through the elasticsearch-dbapi. diff --git a/airflow/providers/elasticsearch/log/es_task_handler.py b/airflow/providers/elasticsearch/log/es_task_handler.py index cb949566d6..8c8847fe1a 100644 --- a/airflow/providers/elasticsearch/log/es_task_handler.py +++ b/airflow/providers/elasticsearch/log/es_task_handler.py @@ -30,7 +30,7 @@ from urllib.parse import quote # Using `from elasticsearch import *` would break elasticsearch mocking used in unit test. import elasticsearch import pendulum -from elasticsearch_dsl import Search +from elasticsearch.exceptions import ElasticsearchException, NotFoundError from airflow.configuration import conf from airflow.exceptions import AirflowProviderDeprecationWarning @@ -52,6 +52,34 @@ EsLogMsgType = List[Tuple[str, str]] USE_PER_RUN_LOG_ID = hasattr(DagRun, "get_log_template") +class Log: +"""wrapper class to mimic the attributes in Search class used in elasticsearch_dsl.Search.""" + +def __init__(self, offset): +self.offset = offset + + +class ElasticSearchResponse: +"""wrapper class to mimic the Search class used in elasticsearch_dsl.Search.""" + +def __init__(self, **kwargs): +# Store all provided keyword arguments as attributes of this object +for
[GitHub] [airflow] potiuk commented on pull request #22253: Add SparkKubernetesOperator crd implementation
potiuk commented on PR #22253: URL: https://github.com/apache/airflow/pull/22253#issuecomment-1605071290 errors. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] dstandish commented on pull request #32053: Change `as_setup` and `as_teardown` to instance methods
dstandish commented on PR #32053: URL: https://github.com/apache/airflow/pull/32053#issuecomment-1605068228 OK @uranusjr @ephraimbuddy @jedcunningham the scope of this PR expanded a little bit. I wanted to add validation so that (1) a task can't be both setup and teardown, (2) once you set a task to be teardown, you can't change it to be setup or vice versa, (3) for mapped operator, explicitly forbid users from setting is_teardown or is_setup or on_failure_fail_dagrun, and (4) forbid user from setting `on_failure_fail_dagrun` unless the task is a teardown task. Clamping down a little more in this way should make the feature a little more user friendly by preventing some usages / configurations that don't make sense or are in any case unsupported. To accomplish this I changed the three attrs to be properties. I added them to abstractoperator and I override the setter in mappedoperator to throw an error. The reason it made sense in abstract operator instead of baseoperator is that in taskflow / xcomarg the iter_references method yields type `Operator` which is either base or mapped operator which both inherit separately from abstractoperator. So when implementing as_teardown etc for taskflow, we can just set the attrs without inspecting type and let mapped operator throw when it's not supported. Ready for a look now. Thank you. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #32048: Allow providers to add default connections
potiuk commented on issue #32048: URL: https://github.com/apache/airflow/issues/32048#issuecomment-1605064459 But yes. I am all for clarifying the behaviour, getting `airflow db init` to be renamed and defining when we create default conneciotns (and whether they should be coming from providers, or it's enouhg to be in the core). It would be great if someone leads it and proposes a solution/approves and fixes it :). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #32048: Allow providers to add default connections
potiuk commented on issue #32048: URL: https://github.com/apache/airflow/issues/32048#issuecomment-1605062163 > How do we decide which providers would have default connections out of the box or in the past, since I don't see it for all of them? were there any criteria to be met? > > Also, I think we need to document this behavior as well, as discussed in [comment](https://github.com/apache/airflow/pull/31533#discussion_r1236891857). It's historical. We have no rules. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[airflow] branch main updated: Add an option to load the dags from db for command tasks run (#32038)
This is an automated email from the ASF dual-hosted git repository. potiuk pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/airflow.git The following commit(s) were added to refs/heads/main by this push: new 05a67efe32 Add an option to load the dags from db for command tasks run (#32038) 05a67efe32 is described below commit 05a67efe32af248ca191ea59815b3b202f893f46 Author: Hussein Awala AuthorDate: Sat Jun 24 00:31:05 2023 +0200 Add an option to load the dags from db for command tasks run (#32038) Signed-off-by: Hussein Awala --- airflow/cli/cli_config.py | 2 ++ airflow/cli/commands/task_command.py| 2 +- airflow/utils/cli.py| 21 --- tests/cli/commands/test_task_command.py | 36 + 4 files changed, 53 insertions(+), 8 deletions(-) diff --git a/airflow/cli/cli_config.py b/airflow/cli/cli_config.py index 0c69571fea..587934769e 100644 --- a/airflow/cli/cli_config.py +++ b/airflow/cli/cli_config.py @@ -604,6 +604,7 @@ ARG_PICKLE = Arg(("-p", "--pickle"), help="Serialized pickle object of the entir ARG_JOB_ID = Arg(("-j", "--job-id"), help=argparse.SUPPRESS) ARG_CFG_PATH = Arg(("--cfg-path",), help="Path to config file to use instead of airflow.cfg") ARG_MAP_INDEX = Arg(("--map-index",), type=int, default=-1, help="Mapped task index") +ARG_READ_FROM_DB = Arg(("--read-from-db",), help="Read dag from DB instead of dag file", action="store_true") # database @@ -1453,6 +1454,7 @@ TASKS_COMMANDS = ( ARG_SHUT_DOWN_LOGGING, ARG_MAP_INDEX, ARG_VERBOSE, +ARG_READ_FROM_DB, ), ), ActionCommand( diff --git a/airflow/cli/commands/task_command.py b/airflow/cli/commands/task_command.py index aab8bb10dc..560764536b 100644 --- a/airflow/cli/commands/task_command.py +++ b/airflow/cli/commands/task_command.py @@ -398,7 +398,7 @@ def task_run(args, dag: DAG | None = None) -> TaskReturnCode | None: print(f"Loading pickle id: {args.pickle}") _dag = get_dag_by_pickle(args.pickle) elif not dag: -_dag = get_dag(args.subdir, args.dag_id) +_dag = get_dag(args.subdir, args.dag_id, args.read_from_db) else: _dag = dag task = _dag.get_task(task_id=args.task_id) diff --git a/airflow/utils/cli.py b/airflow/utils/cli.py index d9e53ac072..56ac166b54 100644 --- a/airflow/utils/cli.py +++ b/airflow/utils/cli.py @@ -215,27 +215,34 @@ def _search_for_dag_file(val: str | None) -> str | None: return None -def get_dag(subdir: str | None, dag_id: str) -> DAG: +def get_dag(subdir: str | None, dag_id: str, from_db: bool = False) -> DAG: """ Returns DAG of a given dag_id. -First it we'll try to use the given subdir. If that doesn't work, we'll try to +First we'll try to use the given subdir. If that doesn't work, we'll try to find the correct path (assuming it's a file) and failing that, use the configured dags folder. """ from airflow.models import DagBag -first_path = process_subdir(subdir) -dagbag = DagBag(first_path) -if dag_id not in dagbag.dags: +if from_db: +dagbag = DagBag(read_dags_from_db=True) +else: +first_path = process_subdir(subdir) +dagbag = DagBag(first_path) +dag = dagbag.get_dag(dag_id) +if not dag: +if from_db: +raise AirflowException(f"Dag {dag_id!r} could not be found in DagBag read from database.") fallback_path = _search_for_dag_file(subdir) or settings.DAGS_FOLDER logger.warning("Dag %r not found in path %s; trying path %s", dag_id, first_path, fallback_path) dagbag = DagBag(dag_folder=fallback_path) -if dag_id not in dagbag.dags: +dag = dagbag.get_dag(dag_id) +if not dag: raise AirflowException( f"Dag {dag_id!r} could not be found; either it does not exist or it failed to parse." ) -return dagbag.dags[dag_id] +return dag def get_dags(subdir: str | None, dag_id: str, use_regex: bool = False): diff --git a/tests/cli/commands/test_task_command.py b/tests/cli/commands/test_task_command.py index 376faeda41..646d76f47c 100644 --- a/tests/cli/commands/test_task_command.py +++ b/tests/cli/commands/test_task_command.py @@ -276,6 +276,42 @@ class TestCliTasks: external_executor_id=None, ) +@pytest.mark.parametrize( +"from_db", +[True, False], +) +@mock.patch("airflow.cli.commands.task_command.LocalTaskJobRunner") +def test_run_with_read_from_db(self, mock_local_job_runner, caplog, from_db): +""" +Test that we can run with read from db +""" +task0_id = self.dag.task_ids[0] +args0 = [ +"tasks", +"run", +"--ignore-all-dependencies", +"--local", +self.dag_id, +
[GitHub] [airflow] potiuk merged pull request #32038: Add an option to load the dags from db for command tasks run
potiuk merged PR #32038: URL: https://github.com/apache/airflow/pull/32038 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk closed issue #32020: Airflow tasks run -m cli command giving 504 response
potiuk closed issue #32020: Airflow tasks run -m cli command giving 504 response URL: https://github.com/apache/airflow/issues/32020 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on pull request #29055: [AIP-51] Executors vending CLI commands
potiuk commented on PR #29055: URL: https://github.com/apache/airflow/pull/29055#issuecomment-1605060078 > This is still blocked by #30727 waiting for a merge of that one Maybe good time to merge it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[airflow] branch constraints-main updated: Updating constraints. Github run id:5360683424
This is an automated email from the ASF dual-hosted git repository. github-bot pushed a commit to branch constraints-main in repository https://gitbox.apache.org/repos/asf/airflow.git The following commit(s) were added to refs/heads/constraints-main by this push: new a5923a2b97 Updating constraints. Github run id:5360683424 a5923a2b97 is described below commit a5923a2b97f0d3a1be96bfb47d5472800df569b8 Author: Automated GitHub Actions commit AuthorDate: Fri Jun 23 22:29:23 2023 + Updating constraints. Github run id:5360683424 This update in constraints is automatically committed by the CI 'constraints-push' step based on 'refs/heads/main' in the 'apache/airflow' repository with commit sha d49fa999a94a2269dd6661fe5eebbb4c768c7848. The action that build those constraints can be found at https://github.com/apache/airflow/actions/runs/5360683424/ The image tag used for that build was: d49fa999a94a2269dd6661fe5eebbb4c768c7848. You can enter Breeze environment with this image by running 'breeze shell --image-tag d49fa999a94a2269dd6661fe5eebbb4c768c7848' All tests passed in this build so we determined we can push the updated constraints. See https://github.com/apache/airflow/blob/main/README.md#installing-from-pypi for details. --- constraints-3.10.txt | 4 ++-- constraints-3.11.txt | 4 ++-- constraints-3.8.txt | 4 ++-- constraints-3.9.txt | 4 ++-- constraints-no-providers-3.10.txt | 2 +- constraints-no-providers-3.11.txt | 2 +- constraints-no-providers-3.8.txt | 2 +- constraints-no-providers-3.9.txt | 2 +- constraints-source-providers-3.10.txt | 4 ++-- constraints-source-providers-3.11.txt | 4 ++-- constraints-source-providers-3.8.txt | 4 ++-- constraints-source-providers-3.9.txt | 4 ++-- 12 files changed, 20 insertions(+), 20 deletions(-) diff --git a/constraints-3.10.txt b/constraints-3.10.txt index 8bf8d5fca1..cca9ccbd63 100644 --- a/constraints-3.10.txt +++ b/constraints-3.10.txt @@ -1,5 +1,5 @@ # -# This constraints file was automatically generated on 2023-06-23T18:09:40Z +# This constraints file was automatically generated on 2023-06-23T22:25:08Z # via "eager-upgrade" mechanism of PIP. For the "main" branch of Airflow. # This variant of constraints install uses the HEAD of the branch version for 'apache-airflow' but installs # the providers from PIP-released packages at the moment of the constraint generation. @@ -401,7 +401,7 @@ msrestazure==0.6.4 multi-key-dict==2.0.3 multidict==6.0.4 mypy-boto3-appflow==1.26.157 -mypy-boto3-rds==1.26.144 +mypy-boto3-rds==1.26.160 mypy-boto3-redshift-data==1.26.109 mypy-boto3-s3==1.26.155 mypy-extensions==1.0.0 diff --git a/constraints-3.11.txt b/constraints-3.11.txt index 0e38d3c5fe..c480e87702 100644 --- a/constraints-3.11.txt +++ b/constraints-3.11.txt @@ -1,5 +1,5 @@ # -# This constraints file was automatically generated on 2023-06-23T18:13:56Z +# This constraints file was automatically generated on 2023-06-23T22:29:20Z # via "eager-upgrade" mechanism of PIP. For the "main" branch of Airflow. # This variant of constraints install uses the HEAD of the branch version for 'apache-airflow' but installs # the providers from PIP-released packages at the moment of the constraint generation. @@ -397,7 +397,7 @@ msrestazure==0.6.4 multi-key-dict==2.0.3 multidict==6.0.4 mypy-boto3-appflow==1.26.157 -mypy-boto3-rds==1.26.144 +mypy-boto3-rds==1.26.160 mypy-boto3-redshift-data==1.26.109 mypy-boto3-s3==1.26.155 mypy-extensions==1.0.0 diff --git a/constraints-3.8.txt b/constraints-3.8.txt index 5242e7ff10..945e5a1ce1 100644 --- a/constraints-3.8.txt +++ b/constraints-3.8.txt @@ -1,5 +1,5 @@ # -# This constraints file was automatically generated on 2023-06-23T18:10:07Z +# This constraints file was automatically generated on 2023-06-23T22:25:27Z # via "eager-upgrade" mechanism of PIP. For the "main" branch of Airflow. # This variant of constraints install uses the HEAD of the branch version for 'apache-airflow' but installs # the providers from PIP-released packages at the moment of the constraint generation. @@ -402,7 +402,7 @@ msrestazure==0.6.4 multi-key-dict==2.0.3 multidict==6.0.4 mypy-boto3-appflow==1.26.157 -mypy-boto3-rds==1.26.144 +mypy-boto3-rds==1.26.160 mypy-boto3-redshift-data==1.26.109 mypy-boto3-s3==1.26.155 mypy-extensions==1.0.0 diff --git a/constraints-3.9.txt b/constraints-3.9.txt index c6c7b1eb83..c97ebdb51d 100644 --- a/constraints-3.9.txt +++ b/constraints-3.9.txt @@ -1,5 +1,5 @@ # -# This constraints file was automatically generated on 2023-06-23T18:10:01Z +# This constraints file was automatically generated on 2023-06-23T22:25:29Z # via "eager-upgrade" mechanism of PIP. For the "main" branch of Airflow. # This variant of constraints install uses the HEAD of the branch version for 'apache-airflow' but installs # the providers from
[GitHub] [airflow] potiuk commented on issue #32111: KubernetesPodOperator job intermittently fails - unable to retrieve json from xcom sidecar container due to network connectivity issues
potiuk commented on issue #32111: URL: https://github.com/apache/airflow/issues/32111#issuecomment-1605058814 Assigned you. Feel free to provide PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on pull request #32054: Upgrade Flask-AppBuilder
potiuk commented on PR #32054: URL: https://github.com/apache/airflow/pull/32054#issuecomment-1605048344 > I'm going to check other changes in appbuilder since 4.3.1 and create a summary. Cool. I think the only thing that matters are changes that might need to be ported to `airflow/www/fab_security`: security_manager.py and sqla/* classes. they have corresponding files in FAB and we should just port the changes since the previous version (4.3.1) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on pull request #32016: Fix an issue that crashes Airflow Webserver when passed invalid private key path to Snowflake
potiuk commented on PR #32016: URL: https://github.com/apache/airflow/pull/32016#issuecomment-1605038514 conflicts :( -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] boring-cyborg[bot] commented on issue #32111: KubernetesPodOperator job intermittently fails - unable to retrieve json from xcom sidecar container due to network connectivity issues
boring-cyborg[bot] commented on issue #32111: URL: https://github.com/apache/airflow/issues/32111#issuecomment-1605035075 Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] aagateuip opened a new issue, #32111: KubernetesPodOperator job intermittently fails - unable to retrieve json from xcom sidecar container due to network connectivity issues
aagateuip opened a new issue, #32111: URL: https://github.com/apache/airflow/issues/32111 ### Apache Airflow version Other Airflow 2 version (please specify below) ### What happened We have seen that KubernetesPodOperator sometimes fails to retrieve json from xcom sidecar container due to network connectivity issues or in some cases retrieves incomplete json which cannot be parsed. The KubernetesPodOperator task then fails with these error stack traces e.g. `File "/home/airflow/.local/lib/python3.7/site-packages/airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py", line 398, in execute result = self.extract_xcom(pod=self.pod) File "/home/airflow/.local/lib/python3.7/site-packages/airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py", line 372, in extract_xcom result = self.pod_manager.extract_xcom(pod) File "/home/airflow/.local/lib/python3.7/site-packages/airflow/providers/cncf/kubernetes/utils/pod_manager.py", line 369, in extract_xcom _preload_content=False, File "/home/airflow/.local/lib/python3.7/site-packages/kubernetes/stream/stream.py", line 35, in _websocket_request return api_method(*args, **kwargs) File "/home/airflow/.local/lib/python3.7/site-packages/kubernetes/client/api/core_v1_api.py", line 994, in connect_get_namespaced_pod_exec return self.connect_get_namespaced_pod_exec_with_http_info(name, namespace, **kwargs) # noqa: E501 File "/home/airflow/.local/lib/python3.7/site-packages/kubernetes/client/api/core_v1_api.py", line 1115, in connect_get_namespaced_pod_exec_with_http_info collection_formats=collection_formats) File "/home/airflow/.local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 353, in call_api _preload_content, _request_timeout, _host) File "/home/airflow/.local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 184, in __call_api _request_timeout=_request_timeout) File "/home/airflow/.local/lib/python3.7/site-packages/kubernetes/stream/ws_client.py", line 518, in websocket_call raise ApiException(status=0, reason=str(e)) kubernetes.client.exceptions.ApiException: (0) Reason: Connection to remote host was lost.` OR ` File "/home/airflow/.local/lib/python3.7/site-packages/airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py", line 398, in execute result = self.extract_xcom(pod=self.pod) File "/home/airflow/.local/lib/python3.7/site-packages/airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py", line 374, in extract_xcom return json.loads(result) File "/usr/local/lib/python3.7/json/__init__.py", line 348, in loads return _default_decoder.decode(s) File "/usr/local/lib/python3.7/json/decoder.py", line 337, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/usr/local/lib/python3.7/json/decoder.py", line 353, in raw_decode obj, end = self.scan_once(s, idx) json.decoder.JSONDecodeError: Unterminated string starting at: line 1 column 4076 (char 4075) ` We are using airflow 2.6.1 and apache-airflow-providers-cncf-kubernetes==4.0.2 ### What you think should happen instead KubefrnetesPodOperator should not fail with these intermittent network connectivity issues when pulling json from xcom sidecar container. It should have retries and verify whether it was able to retrieve valid json before it kills the xcom side car container, extract_xcom should * Read and try to parse return json when its read from /airflow/xcom/return.json - to catch errors if say due to network connectivity issues we don not read incomplete json (truncated json) * Add retries when we read the json - hopefully it will also catch against other network errors to (with kubernetes stream trying to talk to airflow container to get return json) * Only if the return json can be read and parsed (if its valid) now the code goes ahead and kills the sidecar container. ### How to reproduce This occurs intermittently so is hard to reproduce. Happens when the kubernetes cluster is under load. In 7 days we see this happen once or twice. ### Operating System Debian GNU/Linux 11 (bullseye) ### Versions of Apache Airflow Providers airflow 2.6.1 and apache-airflow-providers-cncf-kubernetes==4.0.2 ### Deployment Official Apache Airflow Helm Chart ### Deployment details _No response_ ### Anything else This occurs intermittently so is hard to reproduce. Happens when the kubernetes cluster is under load. In 7 days we see this happen once or twice. ### Are you willing to submit PR? - [X] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an
[GitHub] [airflow] killua1zoldyck commented on issue #31105: airflow-webserver crashes if log files are larger than webserver available RAM
killua1zoldyck commented on issue #31105: URL: https://github.com/apache/airflow/issues/31105#issuecomment-1605020397 Yeah, I thought of a way where we could maintain separate log_pos for different streams. This way we do not have to pull the whole file into memory just to get the final few lines. But I have some reservations about this with regard to whether the log order would change. I will look into this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk closed issue #31054: Can not get more logs of DAGS on UI when deploy on Kubernetes
potiuk closed issue #31054: Can not get more logs of DAGS on UI when deploy on Kubernetes URL: https://github.com/apache/airflow/issues/31054 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #31054: Can not get more logs of DAGS on UI when deploy on Kubernetes
potiuk commented on issue #31054: URL: https://github.com/apache/airflow/issues/31054#issuecomment-1605004330 Can you please update to 2.6.2 - there were a few issues connected to logs in 2.6.1 and 2.6.2 and they seem to be related (see for "logs" in the release notes). I think they should fix your issue. Closing it provisionally - we can always reopen if upgrade does not help. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] guard13 commented on issue #15291: Airflow Docker tutorial: Unhealthy container
guard13 commented on issue #15291: URL: https://github.com/apache/airflow/issues/15291#issuecomment-1605003054 Good morning, I give you the link to my solution which works and allows me to continue Airflow either on the Datascientest VM (but it dropped me, while resetting) or on my VMWare UBUNTU 20.04 https://support.datascientest.com/t/airflow-probleme-dinstallation/14687/3 I answer the question myself since I finally managed to launch Airflow in docker both on the Datascientest VM and on my own VM (But the DataScientest VM let me down this evening) On the other hand everything works under 20.04 on VMWare So the method works on Ubuntu 20.04 The method was therefore very simple: 1- The version of Docker is no longer compatible with Airflow 2- The version of Docker Compose is no longer compatible with Airflow A - So I had to uninstall Docker and reinstall it sudo apt remove docker docker-engine docker.io containerd runc sudo apt autoremove sudo rm -rf /var/lib/docker sudo apt update sudo apt install apt-transport-https ca-certificates curl software-properties-common curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg echo "deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu$(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null sudo apt update sudo apt install docker-ce docker-ce-cli containerd.io docker-v sudo systemctl status docker B- We had to uninstall Docker Compose and reinstall sudo rm /usr/local/bin/docker-compose sudo apt install docker-compose C- Then we follow the command lines of the official Airflow website for Airflow in Docker (https://airflow.apache.org/docs/apache-airflow/stable/howto/docker-compose/index.html) docker run --rm "debian:bullseye-slim" bash -c 'numfmt --to iec $(echo $(($(getconf _PHYS_PAGES) * $(getconf PAGE_SIZE' curl -LfO 'https://airflow.apache.org/docs/apache-airflow/2.6.2/docker-compose.yaml' mkdir -p ./dags ./logs ./plugins ./config echo -e "AIRFLOW_UID=$(id -u)" > .env AIRFLOW_UID=5 docker compose up airflow-init docker compose up AND SURPRISE EVERYTHING WORKS We identify ourselves with the username airflow and the password airflow Courage, it was not easy... ATTENTION : This resets your .env file which will need to be updated (for example for mongodb environment variables) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[airflow] branch main updated: bugfix: break down run+wait method in ECS operator (#32104)
This is an automated email from the ASF dual-hosted git repository. onikolas pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/airflow.git The following commit(s) were added to refs/heads/main by this push: new d49fa999a9 bugfix: break down run+wait method in ECS operator (#32104) d49fa999a9 is described below commit d49fa999a94a2269dd6661fe5eebbb4c768c7848 Author: Raphaël Vandon AuthorDate: Fri Jun 23 14:31:07 2023 -0700 bugfix: break down run+wait method in ECS operator (#32104) This method is just causing trouble by handling several things, it's hiding the logic. A bug fixed in #31838 was reintroduced in #31881 because the check that was skipped on `wait_for_completion` was not skipped anymore. The bug is that checking the status will always fail if not waiting for completion, because obviously the task is not ready just after creation. --- airflow/providers/amazon/aws/operators/ecs.py | 77 +-- 1 file changed, 38 insertions(+), 39 deletions(-) diff --git a/airflow/providers/amazon/aws/operators/ecs.py b/airflow/providers/amazon/aws/operators/ecs.py index 2c2e93af35..91533cfa62 100644 --- a/airflow/providers/amazon/aws/operators/ecs.py +++ b/airflow/providers/amazon/aws/operators/ecs.py @@ -539,46 +539,8 @@ class EcsRunTaskOperator(EcsBaseOperator): if self.reattach: self._try_reattach_task(context) -self._start_wait_task(context) - -self._after_execution(session) - -if self.do_xcom_push and self.task_log_fetcher: -return self.task_log_fetcher.get_last_log_message() -else: -return None - -def execute_complete(self, context, event=None): -if event["status"] != "success": -raise AirflowException(f"Error in task execution: {event}") -self.arn = event["task_arn"] # restore arn to its updated value, needed for next steps -self._after_execution() -if self._aws_logs_enabled(): -# same behavior as non-deferrable mode, return last line of logs of the task. -logs_client = AwsLogsHook(aws_conn_id=self.aws_conn_id, region_name=self.region).conn -one_log = logs_client.get_log_events( -logGroupName=self.awslogs_group, -logStreamName=self._get_logs_stream_name(), -startFromHead=False, -limit=1, -) -if len(one_log["events"]) > 0: -return one_log["events"][0]["message"] - -@provide_session -def _after_execution(self, session=None): -self._check_success_task() - -self.log.info("ECS Task has been successfully executed") - -if self.reattach: -# Clear the XCom value storing the ECS task ARN if the task has completed -# as we can't reattach it anymore -self._xcom_del(session, self.REATTACH_XCOM_TASK_ID_TEMPLATE.format(task_id=self.task_id)) - -@AwsBaseHook.retry(should_retry_eni) -def _start_wait_task(self, context): if not self.arn: +# start the task except if we reattached to an existing one just before. self._start_task(context) if self.deferrable: @@ -598,6 +560,7 @@ class EcsRunTaskOperator(EcsBaseOperator): # 60 seconds is added to allow the trigger to exit gracefully (i.e. yield TriggerEvent) timeout=timedelta(seconds=self.waiter_max_attempts * self.waiter_delay + 60), ) +# self.defer raises a special exception, so execution stops here in this case. if not self.wait_for_completion: return @@ -615,9 +578,45 @@ class EcsRunTaskOperator(EcsBaseOperator): else: self._wait_for_task_ended() +self._after_execution(session) + +if self.do_xcom_push and self.task_log_fetcher: +return self.task_log_fetcher.get_last_log_message() +else: +return None + +def execute_complete(self, context, event=None): +if event["status"] != "success": +raise AirflowException(f"Error in task execution: {event}") +self.arn = event["task_arn"] # restore arn to its updated value, needed for next steps +self._after_execution() +if self._aws_logs_enabled(): +# same behavior as non-deferrable mode, return last line of logs of the task. +logs_client = AwsLogsHook(aws_conn_id=self.aws_conn_id, region_name=self.region).conn +one_log = logs_client.get_log_events( +logGroupName=self.awslogs_group, +logStreamName=self._get_logs_stream_name(), +startFromHead=False, +limit=1, +) +if len(one_log["events"]) > 0: +return one_log["events"][0]["message"] + +@provide_session +def _after_execution(self, session=None): +
[GitHub] [airflow] o-nikolas merged pull request #32104: bugfix: break down run+wait method in ECS operator
o-nikolas merged PR #32104: URL: https://github.com/apache/airflow/pull/32104 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] vandonr-amz opened a new pull request, #32110: add deferrable mode for `AthenaOperator`
vandonr-amz opened a new pull request, #32110: URL: https://github.com/apache/airflow/pull/32110 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[airflow] branch main updated: Use a waiter in `AthenaHook` (#31942)
This is an automated email from the ASF dual-hosted git repository. onikolas pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/airflow.git The following commit(s) were added to refs/heads/main by this push: new 72d09a677f Use a waiter in `AthenaHook` (#31942) 72d09a677f is described below commit 72d09a677fea22b51dbf20f3b12bae6b3c1e4792 Author: Raphaël Vandon AuthorDate: Fri Jun 23 14:20:37 2023 -0700 Use a waiter in `AthenaHook` (#31942) * Use custom waiters for Emr Serverless operators Update unit tests - Co-authored-by: Syed Hussain Co-authored-by: Vincent <97131062+vincb...@users.noreply.github.com> --- airflow/providers/amazon/aws/hooks/athena.py | 80 ++ airflow/providers/amazon/aws/operators/athena.py | 5 +- airflow/providers/amazon/aws/sensors/athena.py | 4 +- airflow/providers/amazon/aws/waiters/athena.json | 30 tests/providers/amazon/aws/hooks/test_athena.py| 12 ++-- .../providers/amazon/aws/operators/test_athena.py | 63 ++--- 6 files changed, 81 insertions(+), 113 deletions(-) diff --git a/airflow/providers/amazon/aws/hooks/athena.py b/airflow/providers/amazon/aws/hooks/athena.py index f68eee9355..b0d1878507 100644 --- a/airflow/providers/amazon/aws/hooks/athena.py +++ b/airflow/providers/amazon/aws/hooks/athena.py @@ -24,12 +24,14 @@ This module contains AWS Athena hook. """ from __future__ import annotations -from time import sleep +import warnings from typing import Any from botocore.paginate import PageIterator +from airflow.exceptions import AirflowException, AirflowProviderDeprecationWarning from airflow.providers.amazon.aws.hooks.base_aws import AwsBaseHook +from airflow.providers.amazon.aws.utils.waiter_with_logging import wait class AthenaHook(AwsBaseHook): @@ -38,8 +40,7 @@ class AthenaHook(AwsBaseHook): Provide thick wrapper around :external+boto3:py:class:`boto3.client("athena") `. -:param sleep_time: Time (in seconds) to wait between two consecutive calls -to check query status on Athena. +:param sleep_time: obsolete, please use the parameter of `poll_query_status` method instead :param log_query: Whether to log athena query and other execution params when it's executed. Defaults to *True*. @@ -65,9 +66,20 @@ class AthenaHook(AwsBaseHook): "CANCELLED", ) -def __init__(self, *args: Any, sleep_time: int = 30, log_query: bool = True, **kwargs: Any) -> None: +def __init__( +self, *args: Any, sleep_time: int | None = None, log_query: bool = True, **kwargs: Any +) -> None: super().__init__(client_type="athena", *args, **kwargs) # type: ignore -self.sleep_time = sleep_time +if sleep_time is not None: +self.sleep_time = sleep_time +warnings.warn( +"The `sleep_time` parameter of the Athena hook is deprecated, " +"please pass this parameter to the poll_query_status method instead.", +AirflowProviderDeprecationWarning, +stacklevel=2, +) +else: +self.sleep_time = 30 # previous default value self.log_query = log_query def run_query( @@ -229,51 +241,31 @@ class AthenaHook(AwsBaseHook): return paginator.paginate(**result_params) def poll_query_status( -self, -query_execution_id: str, -max_polling_attempts: int | None = None, +self, query_execution_id: str, max_polling_attempts: int | None = None, sleep_time: int | None = None ) -> str | None: """Poll the state of a submitted query until it reaches final state. :param query_execution_id: ID of submitted athena query -:param max_polling_attempts: Number of times to poll for query state -before function exits +:param max_polling_attempts: Number of times to poll for query state before function exits +:param sleep_time: Time (in seconds) to wait between two consecutive query status checks. :return: One of the final states """ -try_number = 1 -final_query_state = None # Query state when query reaches final state or max_polling_attempts reached -while True: -query_state = self.check_query_status(query_execution_id) -if query_state is None: -self.log.info( -"Query execution id: %s, trial %s: Invalid query state. Retrying again", -query_execution_id, -try_number, -) -elif query_state in self.TERMINAL_STATES: -self.log.info( -"Query execution id: %s, trial %s: Query execution completed. Final state is %s", -query_execution_id, -try_number, -query_state, -
[GitHub] [airflow] o-nikolas merged pull request #31942: Use a waiter in `AthenaHook`
o-nikolas merged PR #31942: URL: https://github.com/apache/airflow/pull/31942 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[airflow] branch main updated: Refactor Eks Create Cluster Operator code (#31960)
This is an automated email from the ASF dual-hosted git repository. onikolas pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/airflow.git The following commit(s) were added to refs/heads/main by this push: new 5c887988b0 Refactor Eks Create Cluster Operator code (#31960) 5c887988b0 is described below commit 5c887988b02b02e60f693c9341013592a291ee27 Author: Syed Hussaain <103602455+syeda...@users.noreply.github.com> AuthorDate: Fri Jun 23 14:18:13 2023 -0700 Refactor Eks Create Cluster Operator code (#31960) * Refactor EksCreateClusterOperator to reuse code being used in multiple places * Update create_compute method to pass tests Add waiter params to EksCreateClusterOperator and EksCreateNodegroupOperator Update EksCreateFargateProfileTrigger and EksDeleteFargateProfileTrigger to use more consistent waiter names Update unit tests for triggers and operators --- airflow/providers/amazon/aws/operators/eks.py| 249 +++ airflow/providers/amazon/aws/triggers/eks.py | 62 +++--- tests/providers/amazon/aws/operators/test_eks.py | 32 ++- tests/providers/amazon/aws/triggers/test_eks.py | 52 ++--- 4 files changed, 247 insertions(+), 148 deletions(-) diff --git a/airflow/providers/amazon/aws/operators/eks.py b/airflow/providers/amazon/aws/operators/eks.py index 8131be4f65..e280da5e5a 100644 --- a/airflow/providers/amazon/aws/operators/eks.py +++ b/airflow/providers/amazon/aws/operators/eks.py @@ -17,10 +17,11 @@ """This module contains Amazon EKS operators.""" from __future__ import annotations +import logging import warnings from ast import literal_eval from datetime import timedelta -from typing import TYPE_CHECKING, Any, List, Sequence, cast +from typing import TYPE_CHECKING, List, Sequence, cast from botocore.exceptions import ClientError, WaiterError @@ -31,6 +32,7 @@ from airflow.providers.amazon.aws.triggers.eks import ( EksCreateFargateProfileTrigger, EksDeleteFargateProfileTrigger, ) +from airflow.providers.amazon.aws.utils.waiter_with_logging import wait try: from airflow.providers.cncf.kubernetes.operators.pod import KubernetesPodOperator @@ -59,6 +61,75 @@ NODEGROUP_FULL_NAME = "Amazon EKS managed node groups" FARGATE_FULL_NAME = "AWS Fargate profiles" +def _create_compute( +compute: str | None, +cluster_name: str, +aws_conn_id: str, +region: str | None, +waiter_delay: int, +waiter_max_attempts: int, +wait_for_completion: bool = False, +nodegroup_name: str | None = None, +nodegroup_role_arn: str | None = None, +create_nodegroup_kwargs: dict | None = None, +fargate_profile_name: str | None = None, +fargate_pod_execution_role_arn: str | None = None, +fargate_selectors: list | None = None, +create_fargate_profile_kwargs: dict | None = None, +subnets: list[str] | None = None, +): +log = logging.getLogger(__name__) +eks_hook = EksHook(aws_conn_id=aws_conn_id, region_name=region) +if compute == "nodegroup" and nodegroup_name: + +# this is to satisfy mypy +subnets = subnets or [] +create_nodegroup_kwargs = create_nodegroup_kwargs or {} + +eks_hook.create_nodegroup( +clusterName=cluster_name, +nodegroupName=nodegroup_name, +subnets=subnets, +nodeRole=nodegroup_role_arn, +**create_nodegroup_kwargs, +) +if wait_for_completion: +log.info("Waiting for nodegroup to provision. This will take some time.") +wait( +waiter=eks_hook.conn.get_waiter("nodegroup_active"), +waiter_delay=waiter_delay, +max_attempts=waiter_max_attempts, +args={"clusterName": cluster_name, "nodegroupName": nodegroup_name}, +failure_message="Nodegroup creation failed", +status_message="Nodegroup status is", +status_args=["nodegroup.status"], +) +elif compute == "fargate" and fargate_profile_name: + +# this is to satisfy mypy +create_fargate_profile_kwargs = create_fargate_profile_kwargs or {} +fargate_selectors = fargate_selectors or [] + +eks_hook.create_fargate_profile( +clusterName=cluster_name, +fargateProfileName=fargate_profile_name, +podExecutionRoleArn=fargate_pod_execution_role_arn, +selectors=fargate_selectors, +**create_fargate_profile_kwargs, +) +if wait_for_completion: +log.info("Waiting for Fargate profile to provision. This will take some time.") +wait( +waiter=eks_hook.conn.get_waiter("fargate_profile_active"), +waiter_delay=waiter_delay, +max_attempts=waiter_max_attempts, +args={"clusterName": cluster_name, "fargateProfileName": fargate_profile_name}, +
[GitHub] [airflow] o-nikolas merged pull request #31960: Refactor Eks Create Cluster Operator code
o-nikolas merged PR #31960: URL: https://github.com/apache/airflow/pull/31960 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] killua1zoldyck commented on issue #31105: airflow-webserver crashes if log files are larger than webserver available RAM
killua1zoldyck commented on issue #31105: URL: https://github.com/apache/airflow/issues/31105#issuecomment-1604974487 @potiuk The current log-streaming logic pulls the whole file into memory before moving to the specific log_pos based on the metadata after combining all the logs and sorting them based on timestamp. https://github.com/apache/airflow/blob/24e3d6ce57eae1784066ed5678369e61637285a4/airflow/utils/log/file_task_handler.py#L334-L350 So, I don't think we can change the logic to make it streaming. Is my thinking correct? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk closed issue #32023: Container Image not trusting LDAP certifcates
potiuk closed issue #32023: Container Image not trusting LDAP certifcates URL: https://github.com/apache/airflow/issues/32023 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on a diff in pull request #31840: added difference between Deferrable and Non-Deferrable Operators
potiuk commented on code in PR #31840: URL: https://github.com/apache/airflow/pull/31840#discussion_r1240378757 ## docs/apache-airflow/authoring-and-scheduling/deferring.rst: ## @@ -166,3 +166,25 @@ Airflow tries to only run triggers in one place at once, and maintains a heartbe This means it's possible, but unlikely, for triggers to run in multiple places at once; this is designed into the Trigger contract, however, and entirely expected. Airflow will de-duplicate events fired when a trigger is running in multiple places simultaneously, so this process should be transparent to your Operators. Note that every extra ``triggerer`` you run will result in an extra persistent connection to your database. + + +Difference between Mode='reschedule' and Deferrable=True in Sensors +--- + +In Airflow, Sensors wait for specific conditions to be met before proceeding with downstream tasks. Sensors have two options for managing idle periods: mode='reschedule' and deferrable=True. Review Comment: I think we should mention here that `deverrable=True` is a convention used by some operators, where `mode='reschedule'` is BaseSensor parameter. This is metioned as last point in the table, but we do not have any framework support for that and I can imagine other operators to use different convention. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on a diff in pull request #31840: added difference between Deferrable and Non-Deferrable Operators
potiuk commented on code in PR #31840: URL: https://github.com/apache/airflow/pull/31840#discussion_r1240373057 ## docs/apache-airflow/authoring-and-scheduling/deferring.rst: ## @@ -166,3 +166,25 @@ Airflow tries to only run triggers in one place at once, and maintains a heartbe This means it's possible, but unlikely, for triggers to run in multiple places at once; this is designed into the Trigger contract, however, and entirely expected. Airflow will de-duplicate events fired when a trigger is running in multiple places simultaneously, so this process should be transparent to your Operators. Note that every extra ``triggerer`` you run will result in an extra persistent connection to your database. + + +Difference between Mode='reschedule' and Deferrable=True in Sensors +--- + +In Airflow, Sensors wait for specific conditions to be met before proceeding with downstream tasks. Sensors have two options for managing idle periods: mode='reschedule' and deferrable=True. + ++++ +| Mode='reschedule'| Deferrable=True | Review Comment: NIT: Why capitalised? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on pull request #32013: Introducing AirflowClusterPolicySkipDag exception
potiuk commented on PR #32013: URL: https://github.com/apache/airflow/pull/32013#issuecomment-1604933188 BTW. The description should be generic enough to be "transplantable", also it needs ``.. versionadded:: 2.7`` in the documentation as this exception will not be available for earlier versions of Airflow -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on pull request #32013: Introducing AirflowClusterPolicySkipDag exception
potiuk commented on PR #32013: URL: https://github.com/apache/airflow/pull/32013#issuecomment-1604929908 I like the idea. BUT. This will only be useful and usable if you describe your use case in the documentation - https://airflow.apache.org/docs/apache-airflow/stable/best-practices.html is the right place I think. It makes very little sense to add change like that without having a good description on how to use it (and having docstring on a class is not enough - we need a concrete example in the docs with good use case. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] vidyapanchalZS commented on issue #32071: All DAGs file are parsing while executing some "airflow tasks" cli command
vidyapanchalZS commented on issue #32071: URL: https://github.com/apache/airflow/issues/32071#issuecomment-1604914542 Hello Hussein, thank you for looking into this request. In this request , i also mentioned some other "airflow tasks" cli command which giving similar output as "airflow tasks run" command. Requesting you to also add below commands into your feature request. airflow tasks clear airflow tasks failed-deps airflow tasks test -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on pull request #29790: Provider Databricks add jobs create operator.
potiuk commented on PR #29790: URL: https://github.com/apache/airflow/pull/29790#issuecomment-1604888950 > @potiuk is it okay if someone else forks Kyle's repo and fix the static check to not lose @kyle-winkelman's credit/commits and also address the previous comments made by @alexott? I am also okay to wait a bit for Kyle to respond. Fine for me. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] boring-cyborg[bot] commented on issue #32107: Improved error logging for failed Dataflow jobs
boring-cyborg[bot] commented on issue #32107: URL: https://github.com/apache/airflow/issues/32107#issuecomment-1604884866 Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] Srabasti opened a new issue, #32107: Improved error logging for failed Dataflow jobs
Srabasti opened a new issue, #32107: URL: https://github.com/apache/airflow/issues/32107 ### Apache Airflow version Other Airflow 2 version (please specify below) ### What happened When running Dataflow job in Cloud Composer composer-1.20.12-airflow-1.10.15 Airflow 1.10.15, Dataflow job fails throwing a generic error "Exception: DataFlow failed with return code 1", and the reason for the failure is not evident clearly from logs. This issue is in Airflow 1: https://github.com/apache/airflow/blob/d3b066931191b82880d216af103517ea941c74ba/airflow/contrib/hooks/gcp_dataflow_hook.py#L172https://github.com/apache/airflow/blob/d3b066931191b82880d216af103517ea941c74b This issue still exists in Airflow 2. Airflow 2: https://github.com/apache/airflow/blob/main/airflow/providers/google/cloud/hooks/dataflow.py#L1019 Can the error logging be improved to show exact reason and a few lines displayed of the standard error from Dataflow command run, so that it gives context? This will help Dataflow users to identify the root cause of the issue directly from the logs, and avoid additional research and troubleshooting by going through the log details via Cloud Logging. I am happy to contribute and raise PR to help out implementation for the bug fix. I am looking for inputs as to how to integrate with existing code bases. Thanks for your help in advance! Srabasti Banerjee ### What you think should happen instead [2023-06-15 14:04:37,071] {taskinstance.py:1152} ERROR - DataFlow failed with return code 1 Traceback (most recent call last): File "/usr/local/lib/airflow/airflow/models/taskinstance.py", line 985, in _run_raw_task result = task_copy.execute(context=context) File "/usr/local/lib/airflow/airflow/operators/python_operator.py", line 113, in execute return_value = self.execute_callable() File "/usr/local/lib/airflow/airflow/operators/python_operator.py", line 118, in execute_callable return self.python_callable(*self.op_args, **self.op_kwargs) File "/home/airflow/gcs/dags/X.zip/X.py", line Y, in task DataFlowPythonOperator( File "/usr/local/lib/airflow/airflow/contrib/operators/dataflow_operator.py", line 379, in execute hook.start_python_dataflow( File "/usr/local/lib/airflow/airflow/contrib/hooks/gcp_dataflow_hook.py", line 245, in start_python_dataflow self._start_dataflow(variables, name, ["python"] + py_options + [dataflow], File "/usr/local/lib/airflow/airflow/contrib/hooks/gcp_api_base_hook.py", line 363, in wrapper return func(self, *args, **kwargs) File "/usr/local/lib/airflow/airflow/contrib/hooks/gcp_dataflow_hook.py", line 204, in _start_dataflow job_id = _Dataflow(cmd).wait_for_done() File "/usr/local/lib/airflow/airflow/contrib/hooks/gcp_dataflow_hook.py", line 178, in wait_for_done raise Exception("DataFlow failed with return code {}".format( Exception: DataFlow failed with return code 1 ### How to reproduce Any Failed Dataflow job that involves deleting a file when it is in process of being ingested via Dataflow job task run via Cloud Composer. Please let me know for any details needed. ### Operating System Cloud Composer ### Versions of Apache Airflow Providers _No response_ ### Deployment Google Cloud Composer ### Deployment details _No response_ ### Anything else _No response_ ### Are you willing to submit PR? - [X] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] tomrutter commented on issue #32091: Triggerer intermittent failure when running many triggerers
tomrutter commented on issue #32091: URL: https://github.com/apache/airflow/issues/32091#issuecomment-1604881767 I think task instance is set just below on line 691. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] tomrutter commented on issue #32091: Triggerer intermittent failure when running many triggerers
tomrutter commented on issue #32091: URL: https://github.com/apache/airflow/issues/32091#issuecomment-1604832307 Will check, but wouldn't all triggers fail if that were the case? There are many (>100) identical triggers working and only one that fails (all using the same custom trigger). I notice that if the trigger heartbeat is delayed it can push the triggers to other triggerers. Could that open the window for the problem above under high load? The custom triggers do one http request using async call and then parse the json response. I don't think they are blocking but sometimes the triggerer complains of blocking when many are running at once. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] stikkireddy commented on pull request #29790: Provider Databricks add jobs create operator.
stikkireddy commented on PR #29790: URL: https://github.com/apache/airflow/pull/29790#issuecomment-1604803451 @potiuk is it okay if someone else forks Kyle's repo and fix the static check to not lose @kyle-winkelman's credit/commits and also address the previous comments made by @alexott? I am also okay to wait a bit for Kyle to respond. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] raphaelauv commented on pull request #32004: bump debian to bookworm
raphaelauv commented on PR #32004: URL: https://github.com/apache/airflow/pull/32004#issuecomment-1604795966 thanks for the explanation I could build locally the CI image without mysql -> ```shell breeze ci-image build --force-build --python-image python:3.11-slim-bookworm --python 3.11 ``` no release for debian 12 yet -> https://dev.mysql.com/downloads/mysql/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] darkag commented on pull request #32089: Parameters for vertica
darkag commented on PR #32089: URL: https://github.com/apache/airflow/pull/32089#issuecomment-1604773094 I tried to simplify code. I added a better handling for log_level parameter (add possibility to pass log level as "warning" and not just 30). Correction for request_complex_types which is true by default (impossible to pass false with the previous version). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] hussein-awala commented on issue #32091: Triggerer intermittent failure when running many triggerers
hussein-awala commented on issue #32091: URL: https://github.com/apache/airflow/issues/32091#issuecomment-1604772168 > It seems that as triggers fire, the link between the trigger row and the associated task_instance for the trigger is removed before the trigger row is removed. This leaves a small amount of time where the trigger exists without an associated task_instance. The database updates are performed in a synchronous loop inside the triggerer, so with one triggerer, this is not a problem. However, it can be a problem with more than one triggerer. It should not happen even if you have multiple triggerers, because each one take a part of the unassigned triggers (base on capacity arguments) and locks the rows in the DB until update them to add its ID. Also I don't believe that we delete the task_instance before delete the trigger since we load them from a dequeue, and you get this exception when you read from it. I think your problem is with line: https://github.com/apache/airflow/blob/a1ba15570219d2fe77466367e84fafa92cbdb24e/airflow/jobs/triggerer_job_runner.py#L685 where your custom trigger doesn't set the task_instance parameter, and keep it None. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] pankajkoti commented on issue #32096: Status of testing of Apache Airflow Helm Chart 1.10.0rc1
pankajkoti commented on issue #32096: URL: https://github.com/apache/airflow/issues/32096#issuecomment-1604761320 Thank you for clarifying in detail @potiuk and the quick fix. Yes, was wondering intially if the image itself needed some fix, but all clear now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] bhagany commented on issue #32106: GCSToBigQueryOperator and BigQueryToGCSOperator do not respect their project_id arguments
bhagany commented on issue #32106: URL: https://github.com/apache/airflow/issues/32106#issuecomment-1604752241 I should mention that I am about to go on a 2-week vacation, so while I am willing to discuss and submit a PR, my responses will be delayed. I also think this is a serious enough bug for someone else to take on if they have the time. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] boring-cyborg[bot] commented on issue #32106: GCSToBigQueryOperator and BigQueryToGCSOperator do not respect their project_id arguments
boring-cyborg[bot] commented on issue #32106: URL: https://github.com/apache/airflow/issues/32106#issuecomment-1604748574 Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] bhagany opened a new issue, #32106: GCSToBigQueryOperator and BigQueryToGCSOperator do not respect their project_id arguments
bhagany opened a new issue, #32106: URL: https://github.com/apache/airflow/issues/32106 ### Apache Airflow version Other Airflow 2 version (please specify below) ### What happened We experienced this issue Airflow 2.6.1, but the problem exists in the Google provider rather than core Airflow, and were introduced with [these changes](https://github.com/apache/airflow/pull/30053/files). We are using version 10.0.0 of the provider. The [issue](https://github.com/apache/airflow/issues/29958) that resulted in these changes seems to be based on an incorrect understanding of how projects interact in BigQuery -- namely that the project used for storage and the project used for compute can be separate. The user reporting the issue appears to mistake an error about _compute_ (`User does not have bigquery.jobs.create permission in project {project-A}.` for an error about _storage_, and this incorrect diagnosis resulted in a fix that inappropriately defaults the compute project to the project named in destination/source (depending on the operator) table. The change attempts to allow users to override this (imo incorrect) default, but unfortunately this does not currently work because `self.project_id` gets overridden with the named table's project [here](https://github.com/apache/airflow/pull/30053/files#diff-875bf3d1bfbba7067dc754732c0e416b8ebe7a5b722bc9ac428b98934f04a16fR512) and [here](https://github.com/apache/airflow/pull/30053/files#diff-875bf3d1bfbba7067dc754732c0e416b8ebe7a5b722bc9ac428b98934f04a16fR587). ### What you think should happen instead I think that the easiest fix would be to revert the change, and return to defaulting the compute project to the one specified in the default google cloud connection. However, since I can understand the desire to override the `project_id`, I think handling it correctly, and clearly distinguishing between the concepts of storage and compute w/r/t projects would also work. ### How to reproduce Attempt to use any other project for running the job, besides the one named in the source/destination table ### Operating System debian ### Versions of Apache Airflow Providers apache-airflow-providers-google==10.0.0 ### Deployment Other 3rd-party Helm chart ### Deployment details _No response_ ### Anything else _No response_ ### Are you willing to submit PR? - [X] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #32096: Status of testing of Apache Airflow Helm Chart 1.10.0rc1
potiuk commented on issue #32096: URL: https://github.com/apache/airflow/issues/32096#issuecomment-1604742095 Looks good ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] bhagany commented on issue #32093: GCSToBQ operator does not respect `project_id` in deferrable mode with impersonation chain.
bhagany commented on issue #32093: URL: https://github.com/apache/airflow/issues/32093#issuecomment-1604715011 Thank you for submitting this -- I have been procrastinating filing an issue of my own about this inconsistent handling of project ids. I believe the problems with changes to `GCSToBigQueryOperator` and `BigQueryToGCSOperator` extend further than the case you've raised, and are caused by [these changes](https://github.com/apache/airflow/pull/30053/files). Would you be willing to talk about the larger set of issues as well? I see the PR you submitted and I think your change is a good one, but I think more are needed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #32096: Status of testing of Apache Airflow Helm Chart 1.10.0rc1
potiuk commented on issue #32096: URL: https://github.com/apache/airflow/issues/32096#issuecomment-1604711224 BTW @pankajkoti clarifying comment: The reason why the PR to remove support was included in the list was this https://github.com/apache/airflow/pull/30963/files#diff-b3ffb75587232c802c4c65d8d27f30a07f72e1819e3a695ca5758d727bde3b31L2079 `"description": "Enable triggerer (requires Python 3.7+).", -> `"description": "Enable triggerer",` And well - this was actually an overdue change. This should have been removed already when we got rid of airflow 3.6 - I just removed it while removing python 3.7, because I looked for all the 3.7 occurences. So even if "Remove Python 3.7" support was part of the PRs in the chart, the actual removal of Python 3.7 had nothing to do with the chart -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] vijayasarathib commented on pull request #32105: Adding missing hyperlink to the tutorial documentation
vijayasarathib commented on PR #32105: URL: https://github.com/apache/airflow/pull/32105#issuecomment-1604703092 > sure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] dnskr commented on issue #32096: Status of testing of Apache Airflow Helm Chart 1.10.0rc1
dnskr commented on issue #32096: URL: https://github.com/apache/airflow/issues/32096#issuecomment-1604701317 - [Use template comments for the chart license header #30569](https://github.com/apache/airflow/pull/30569) Looks good, all license header replacements are in place. - [Cleanup Kubernetes < 1.23 support #31847](https://github.com/apache/airflow/pull/31847) Checked. There are no `kubeVersion` references. - [Align apiVersion and kind order in chart templates #31850](https://github.com/apache/airflow/pull/31850) Looks good, all changes are in place. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on pull request #32105: Adding missing hyperlink to the tutorial documentation
potiuk commented on PR #32105: URL: https://github.com/apache/airflow/pull/32105#issuecomment-1604700861 Ah. Checked that the link does not work. There is a mistake with "alias" package for the BaseOperator. he right fix is to change it to `airflow.models.baseoperator.BaseOperator` in py:class clause. Can you please update it @vijayasarathib ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on pull request #32105: Adding missing hyperlink to the tutorial documentation
potiuk commented on PR #32105: URL: https://github.com/apache/airflow/pull/32105#issuecomment-1604696556 The link is there. Clicking on class should lead you there. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk closed pull request #32105: Adding missing hyperlink to the tutorial documentation
potiuk closed pull request #32105: Adding missing hyperlink to the tutorial documentation URL: https://github.com/apache/airflow/pull/32105 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] vijayasarathib opened a new pull request, #32105: Adding missing hyperlink to the tutorial documentation
vijayasarathib opened a new pull request, #32105: URL: https://github.com/apache/airflow/pull/32105 --- **^ Add meaningful description above** Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#pull-request-guidelines)** for more information. In case of fundamental code changes, an Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvement+Proposals)) is needed. In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x). In case of backwards incompatible changes please leave a note in a newsfragment file, named `{pr_number}.significant.rst` or `{issue_number}.significant.rst`, in [newsfragments](https://github.com/apache/airflow/tree/main/newsfragments). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on a diff in pull request #32052: Disable allowing by default testing of connnections in UI
potiuk commented on code in PR #32052: URL: https://github.com/apache/airflow/pull/32052#discussion_r1240170213 ## airflow/www/static/js/connection_form.js: ## @@ -123,6 +125,16 @@ function applyFieldBehaviours(connection) { */ function handleTestConnection(connectionType, testableConnections) { const testButton = document.getElementById("test-connection"); + + if (!configTestConnectionEnabled) { +// If test connection is not enabled in config, disable button and display toolip +// alerting the user. +$(testButton) + .prop("disabled", true) + .attr("title", "Test connection is not enabled in config."); Review Comment: 樂 I am ok with either approach as long as we have it off by default :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[airflow] branch main updated: Remove unclear (referring to CI) paragraph from "support python" (#32103)
This is an automated email from the ASF dual-hosted git repository. potiuk pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/airflow.git The following commit(s) were added to refs/heads/main by this push: new 24e3d6ce57 Remove unclear (referring to CI) paragraph from "support python" (#32103) 24e3d6ce57 is described below commit 24e3d6ce57eae1784066ed5678369e61637285a4 Author: Jarek Potiuk AuthorDate: Fri Jun 23 20:33:38 2023 +0200 Remove unclear (referring to CI) paragraph from "support python" (#32103) The paragraph was supposed to clarify things but it makes it confusing - it attempted to explain the difference vs "default" and supported versions of Python, referring to CI. Unfortunately this relation changes over time especially in transition phase like we are now where some CI runs on 3.7 (in v2-6-test branch) but some CI runs on 3.8 (in main branch). Removal of the paragraph does not change anything, it actually makes the whole chapter consistent and previous paragraph already explains the rules. I believe we should remove it. --- README.md | 11 ++- 1 file changed, 2 insertions(+), 9 deletions(-) diff --git a/README.md b/README.md index 036c588e6c..aa71420ecd 100644 --- a/README.md +++ b/README.md @@ -307,18 +307,11 @@ They are based on the official release schedule of Python and Kubernetes, nicely means that we will drop support in main right after 27.06.2023, and the first MAJOR or MINOR version of Airflow released after will not have it. -2. The "oldest" supported version of Python/Kubernetes is the default one until we decide to switch to - later version. "Default" is only meaningful in terms of "smoke tests" in CI PRs, which are run using this - default version and the default reference image available. Currently `apache/airflow:latest` - and `apache/airflow:2.6.2` images are Python 3.8 images. This means that default reference image will - become the default at the time when we start preparing for dropping 3.8 support which is few months - before the end of life for Python 3.8. - -3. We support a new version of Python/Kubernetes in main after they are officially released, as soon as we +2. We support a new version of Python/Kubernetes in main after they are officially released, as soon as we make them work in our CI pipeline (which might not be immediate due to dependencies catching up with new versions of Python mostly) we release new images/support in Airflow based on the working CI setup. -4. This policy is best-effort which means there may be situations where we might terminate support earlier +3. This policy is best-effort which means there may be situations where we might terminate support earlier if circumstances require it. ## Base OS support for reference Airflow images
[GitHub] [airflow] potiuk merged pull request #32103: Remove unclear (referring to CI) paragraph from "support python"
potiuk merged PR #32103: URL: https://github.com/apache/airflow/pull/32103 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #32096: Status of testing of Apache Airflow Helm Chart 1.10.0rc1
potiuk commented on issue #32096: URL: https://github.com/apache/airflow/issues/32096#issuecomment-1604683871 https://github.com/apache/airflow/pull/32103 -> to remove the confusing paragraph. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on pull request #32103: Remove unclear (referring to CI) paragraph from "support python"
potiuk commented on PR #32103: URL: https://github.com/apache/airflow/pull/32103#issuecomment-1604683438 It's funny to read your own description that you don't understand or that is more confusing than lack of it. Especially in the transition period, the original "clarification" intent makes it even more confusing. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] vandonr-amz opened a new pull request, #32104: bugfix: break down run+wait method in ECS operator
vandonr-amz opened a new pull request, #32104: URL: https://github.com/apache/airflow/pull/32104 This method is just causing trouble by handling several things, it's hiding the logic. A bug fixed in #31838 was reintroduced in #31881 because the check that was skipped on `wait_for_completion` was not skipped anymore. The bug is that checking the status will always fail if not waiting for completion, because obviously the task is not ready just after creation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk opened a new pull request, #32103: Remove unclear (referring to CI) paragraph from "support python"
potiuk opened a new pull request, #32103: URL: https://github.com/apache/airflow/pull/32103 The paragraph was supposed to clarify things but it makes it confusing - it attempted to explain the difference vs "default" and supported versions of Python, referring to CI. Unfortunately this relation changes over time especially in transition phase like we are now where some CI runs on 3.7 (in v2-6-test branch) but some CI runs on 3.8 (in main branch). Removal of the paragraph does not change anything, it actually makes the whole chapter consistent and previous paragraph already explains the rules. I believe we should remove it. --- **^ Add meaningful description above** Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#pull-request-guidelines)** for more information. In case of fundamental code changes, an Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvement+Proposals)) is needed. In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x). In case of backwards incompatible changes please leave a note in a newsfragment file, named `{pr_number}.significant.rst` or `{issue_number}.significant.rst`, in [newsfragments](https://github.com/apache/airflow/tree/main/newsfragments). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] dstandish commented on a diff in pull request #32053: Change `as_setup` and `as_teardown` to instance methods
dstandish commented on code in PR #32053: URL: https://github.com/apache/airflow/pull/32053#discussion_r1240156463 ## airflow/example_dags/example_setup_teardown_taskflow.py: ## @@ -47,12 +45,10 @@ def normal(): @task_group def section_1(): # You can also have setup and teardown tasks at the task group level -@setup @task def my_setup(): print("I set up") -@teardown Review Comment: yeah you are right -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #32096: Status of testing of Apache Airflow Helm Chart 1.10.0rc1
potiuk commented on issue #32096: URL: https://github.com/apache/airflow/issues/32096#issuecomment-1604673066 > I am wondering if we want to include #30963 here as the PR has an associated milestone of 2.7.0 Python version is connected with Airflow version. It's not "chart" property it's actually "image" property. And according to our agreed policies, default versio of Airflow supported in 2.6 is 3.7. From: https://github.com/apache/airflow/blob/main/README.md#support-for-python-and-kubernetes-versions > We drop support for Python and Kubernetes versions when they reach EOL. Except for Kubernetes, a version stays supported by Airflow if two major cloud providers still provide support for it. We drop support for those EOL versions in main right after EOL date, and it is effectively removed when we release the first new MINOR (Or MAJOR if there is no new MINOR version) of Airflow. For example, for Python 3.8 it means that we will drop support in main right after 27.06.2023, and the first MAJOR or MINOR version of Airflow released after will not have it. I think there is a one mistake in README regarding 2.6.2 image after we bumped version numbers (I will fix it in a moment) but intention was that 3.7 will remain default and released version for all 2.6* airflow/image versions. The 3.8 will become default/minimum version supported when we release 2.7.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[airflow] branch constraints-main updated: Updating constraints. Github run id:5358827021
This is an automated email from the ASF dual-hosted git repository. github-bot pushed a commit to branch constraints-main in repository https://gitbox.apache.org/repos/asf/airflow.git The following commit(s) were added to refs/heads/constraints-main by this push: new 09e7891ea0 Updating constraints. Github run id:5358827021 09e7891ea0 is described below commit 09e7891ea0931b045d9fc1153978710f8419dc0d Author: Automated GitHub Actions commit AuthorDate: Fri Jun 23 18:13:59 2023 + Updating constraints. Github run id:5358827021 This update in constraints is automatically committed by the CI 'constraints-push' step based on 'refs/heads/main' in the 'apache/airflow' repository with commit sha 415e0767616121854b6a29b3e44387f708cdf81e. The action that build those constraints can be found at https://github.com/apache/airflow/actions/runs/5358827021/ The image tag used for that build was: 415e0767616121854b6a29b3e44387f708cdf81e. You can enter Breeze environment with this image by running 'breeze shell --image-tag 415e0767616121854b6a29b3e44387f708cdf81e' All tests passed in this build so we determined we can push the updated constraints. See https://github.com/apache/airflow/blob/main/README.md#installing-from-pypi for details. --- constraints-3.10.txt | 2 +- constraints-3.11.txt | 2 +- constraints-3.8.txt | 2 +- constraints-3.9.txt | 2 +- constraints-no-providers-3.10.txt | 2 +- constraints-no-providers-3.11.txt | 2 +- constraints-no-providers-3.8.txt | 2 +- constraints-no-providers-3.9.txt | 2 +- constraints-source-providers-3.10.txt | 4 ++-- constraints-source-providers-3.11.txt | 4 ++-- constraints-source-providers-3.8.txt | 4 ++-- constraints-source-providers-3.9.txt | 4 ++-- 12 files changed, 16 insertions(+), 16 deletions(-) diff --git a/constraints-3.10.txt b/constraints-3.10.txt index 1c777a602b..8bf8d5fca1 100644 --- a/constraints-3.10.txt +++ b/constraints-3.10.txt @@ -1,5 +1,5 @@ # -# This constraints file was automatically generated on 2023-06-23T16:09:54Z +# This constraints file was automatically generated on 2023-06-23T18:09:40Z # via "eager-upgrade" mechanism of PIP. For the "main" branch of Airflow. # This variant of constraints install uses the HEAD of the branch version for 'apache-airflow' but installs # the providers from PIP-released packages at the moment of the constraint generation. diff --git a/constraints-3.11.txt b/constraints-3.11.txt index 0c3142f5d7..0e38d3c5fe 100644 --- a/constraints-3.11.txt +++ b/constraints-3.11.txt @@ -1,5 +1,5 @@ # -# This constraints file was automatically generated on 2023-06-23T16:14:03Z +# This constraints file was automatically generated on 2023-06-23T18:13:56Z # via "eager-upgrade" mechanism of PIP. For the "main" branch of Airflow. # This variant of constraints install uses the HEAD of the branch version for 'apache-airflow' but installs # the providers from PIP-released packages at the moment of the constraint generation. diff --git a/constraints-3.8.txt b/constraints-3.8.txt index 7790aa26ae..5242e7ff10 100644 --- a/constraints-3.8.txt +++ b/constraints-3.8.txt @@ -1,5 +1,5 @@ # -# This constraints file was automatically generated on 2023-06-23T16:10:19Z +# This constraints file was automatically generated on 2023-06-23T18:10:07Z # via "eager-upgrade" mechanism of PIP. For the "main" branch of Airflow. # This variant of constraints install uses the HEAD of the branch version for 'apache-airflow' but installs # the providers from PIP-released packages at the moment of the constraint generation. diff --git a/constraints-3.9.txt b/constraints-3.9.txt index 40e1783f77..c6c7b1eb83 100644 --- a/constraints-3.9.txt +++ b/constraints-3.9.txt @@ -1,5 +1,5 @@ # -# This constraints file was automatically generated on 2023-06-23T16:10:10Z +# This constraints file was automatically generated on 2023-06-23T18:10:01Z # via "eager-upgrade" mechanism of PIP. For the "main" branch of Airflow. # This variant of constraints install uses the HEAD of the branch version for 'apache-airflow' but installs # the providers from PIP-released packages at the moment of the constraint generation. diff --git a/constraints-no-providers-3.10.txt b/constraints-no-providers-3.10.txt index 756f57eb6c..ab6f86c9b2 100644 --- a/constraints-no-providers-3.10.txt +++ b/constraints-no-providers-3.10.txt @@ -1,5 +1,5 @@ # -# This constraints file was automatically generated on 2023-06-23T16:06:29Z +# This constraints file was automatically generated on 2023-06-23T18:06:19Z # via "eager-upgrade" mechanism of PIP. For the "main" branch of Airflow. # This variant of constraints install just the 'bare' 'apache-airflow' package build from the HEAD of # the branch, without installing any of the providers. diff --git a/constraints-no-providers-3.11.txt b/constraints-no-providers-3.11.txt
[GitHub] [airflow] pankajkoti commented on issue #32096: Status of testing of Apache Airflow Helm Chart 1.10.0rc1
pankajkoti commented on issue #32096: URL: https://github.com/apache/airflow/issues/32096#issuecomment-1604651640 I am wondering if we want to include https://github.com/apache/airflow/pull/30963 here as the PR has an associated milestone of 2.7.0 If we want to include it here, I have a question next. When I deploy the helm chart from this RC, it runs fine and example DAGs are running fine. But when I exec into pods, and enter python shell it uses Python3.7, also example DAGs with Python operator seem to use Python3.7. What should be the expected Python version here? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] ephraimbuddy commented on a diff in pull request #32053: Change `as_setup` and `as_teardown` to instance methods
ephraimbuddy commented on code in PR #32053: URL: https://github.com/apache/airflow/pull/32053#discussion_r1240114070 ## airflow/example_dags/example_setup_teardown_taskflow.py: ## @@ -47,12 +45,10 @@ def normal(): @task_group def section_1(): # You can also have setup and teardown tasks at the task group level -@setup @task def my_setup(): print("I set up") -@teardown Review Comment: Should we create another example dag to demonstrate this usage instead of removing this one? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] spire-mike commented on issue #29531: Dynamic task mapping does not always create mapped tasks
spire-mike commented on issue #29531: URL: https://github.com/apache/airflow/issues/29531#issuecomment-1604625983 Thanks @knab-analytics, that's too bad. I hope it gets resolved. As a workaround I set `depends_on_past=False` and am using `ExternalTaskSensor` to ensure the last task of the previous run was successful. It's not ideal but it works for my specific situation. Just mentioning it in case it helps you. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] o-nikolas commented on a diff in pull request #32052: Disable allowing by default testing of connnections in UI
o-nikolas commented on code in PR #32052: URL: https://github.com/apache/airflow/pull/32052#discussion_r1240093747 ## airflow/www/static/js/connection_form.js: ## @@ -123,6 +125,16 @@ function applyFieldBehaviours(connection) { */ function handleTestConnection(connectionType, testableConnections) { const testButton = document.getElementById("test-connection"); + + if (!configTestConnectionEnabled) { +// If test connection is not enabled in config, disable button and display toolip +// alerting the user. +$(testButton) + .prop("disabled", true) + .attr("title", "Test connection is not enabled in config."); Review Comment: To play the other Devil's advocate, I think complicating this config beyond a boolean might be over-engineering this. I think having the button there, but greyed out, is useful for discoverability, and I don't think folks will really harass the deployment manager much in reality because of it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #31733: after upgrade 2.6.0 from 2.3.4, task can't show log in webui
potiuk commented on issue #31733: URL: https://github.com/apache/airflow/issues/31733#issuecomment-1604592441 Can you please update to 2.6.2 ? I believe this is manifestation of a known issue -> see release notes and look for 'logs' https://airflow.apache.org/docs/apache-airflow/stable/release_notes.html#airflow-2-6-1-2023-05-16 - you will find the issue. Generally speaking it makes very little sense to install .0 version when bugfix-only versions are out there. I am not sure why one would do it> @Scope0910 - could you please explain why you are installing 2.6.0 when 2.6.2 (bugfix only) is released already and 2.6.1 has been released few weeks ago? Closing this one as duplicate. We can reopen after user upgrades and still showing the issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] blcksrx commented on a diff in pull request #31798: fix spark-kubernetes-operator compatibility
blcksrx commented on code in PR #31798: URL: https://github.com/apache/airflow/pull/31798#discussion_r1240077652 ## pyproject.toml: ## @@ -75,7 +75,7 @@ extend-ignore = [ # * Disable `flaky` plugin for pytest. This plugin conflicts with `rerunfailures` because provide same marker. # * Disable `nose` builtin plugin for pytest. This feature deprecated in 7.2 and will be removed in pytest>=8 # * And we focus on use native pytest capabilities rather than adopt another frameworks. -addopts = "-rasl --verbosity=2 -p no:flaky -p no:nose --asyncio-mode=strict" Review Comment: Oh sorry, that was a clumsy action -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on a diff in pull request #31798: fix spark-kubernetes-operator compatibility
potiuk commented on code in PR #31798: URL: https://github.com/apache/airflow/pull/31798#discussion_r1240072325 ## pyproject.toml: ## @@ -75,7 +75,7 @@ extend-ignore = [ # * Disable `flaky` plugin for pytest. This plugin conflicts with `rerunfailures` because provide same marker. # * Disable `nose` builtin plugin for pytest. This feature deprecated in 7.2 and will be removed in pytest>=8 # * And we focus on use native pytest capabilities rather than adopt another frameworks. -addopts = "-rasl --verbosity=2 -p no:flaky -p no:nose --asyncio-mode=strict" Review Comment: Why changing that ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[airflow] branch main updated: Deferrable mode for ECS operators (#31881)
This is an automated email from the ASF dual-hosted git repository. potiuk pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/airflow.git The following commit(s) were added to refs/heads/main by this push: new 415e076761 Deferrable mode for ECS operators (#31881) 415e076761 is described below commit 415e0767616121854b6a29b3e44387f708cdf81e Author: Raphaël Vandon AuthorDate: Fri Jun 23 10:13:13 2023 -0700 Deferrable mode for ECS operators (#31881) --- airflow/providers/amazon/aws/operators/ecs.py | 160 ++--- airflow/providers/amazon/aws/triggers/ecs.py | 198 + .../providers/amazon/aws/utils/task_log_fetcher.py | 5 +- airflow/providers/amazon/provider.yaml | 3 + tests/providers/amazon/aws/operators/test_ecs.py | 74 +++- tests/providers/amazon/aws/triggers/test_ecs.py| 123 + .../amazon/aws/utils/test_task_log_fetcher.py | 2 +- 7 files changed, 532 insertions(+), 33 deletions(-) diff --git a/airflow/providers/amazon/aws/operators/ecs.py b/airflow/providers/amazon/aws/operators/ecs.py index bc8c4b70d7..2c2e93af35 100644 --- a/airflow/providers/amazon/aws/operators/ecs.py +++ b/airflow/providers/amazon/aws/operators/ecs.py @@ -35,6 +35,11 @@ from airflow.providers.amazon.aws.hooks.ecs import ( EcsHook, should_retry_eni, ) +from airflow.providers.amazon.aws.hooks.logs import AwsLogsHook +from airflow.providers.amazon.aws.triggers.ecs import ( +ClusterWaiterTrigger, +TaskDoneTrigger, +) from airflow.providers.amazon.aws.utils.task_log_fetcher import AwsTaskLogFetcher from airflow.utils.helpers import prune_dict from airflow.utils.session import provide_session @@ -67,6 +72,15 @@ class EcsBaseOperator(BaseOperator): """Must overwrite in child classes.""" raise NotImplementedError("Please implement execute() in subclass") +def _complete_exec_with_cluster_desc(self, context, event=None): +"""To be used as trigger callback for operators that return the cluster description.""" +if event["status"] != "success": +raise AirflowException(f"Error while waiting for operation on cluster to complete: {event}") +cluster_arn = event.get("arn") +# We cannot get the cluster definition from the waiter on success, so we have to query it here. +details = self.hook.conn.describe_clusters(clusters=[cluster_arn])["clusters"][0] +return details + class EcsCreateClusterOperator(EcsBaseOperator): """ @@ -84,9 +98,17 @@ class EcsCreateClusterOperator(EcsBaseOperator): if not set then the default waiter value will be used. :param waiter_max_attempts: The maximum number of attempts to be made, if not set then the default waiter value will be used. +:param deferrable: If True, the operator will wait asynchronously for the job to complete. +This implies waiting for completion. This mode requires aiobotocore module to be installed. +(default: False) """ -template_fields: Sequence[str] = ("cluster_name", "create_cluster_kwargs", "wait_for_completion") +template_fields: Sequence[str] = ( +"cluster_name", +"create_cluster_kwargs", +"wait_for_completion", +"deferrable", +) def __init__( self, @@ -94,8 +116,9 @@ class EcsCreateClusterOperator(EcsBaseOperator): cluster_name: str, create_cluster_kwargs: dict | None = None, wait_for_completion: bool = True, -waiter_delay: int | None = None, -waiter_max_attempts: int | None = None, +waiter_delay: int = 15, +waiter_max_attempts: int = 60, +deferrable: bool = False, **kwargs, ) -> None: super().__init__(**kwargs) @@ -104,6 +127,7 @@ class EcsCreateClusterOperator(EcsBaseOperator): self.wait_for_completion = wait_for_completion self.waiter_delay = waiter_delay self.waiter_max_attempts = waiter_max_attempts +self.deferrable = deferrable def execute(self, context: Context): self.log.info( @@ -119,6 +143,21 @@ class EcsCreateClusterOperator(EcsBaseOperator): # In some circumstances the ECS Cluster is created immediately, # and there is no reason to wait for completion. self.log.info("Cluster %r in state: %r.", self.cluster_name, cluster_state) +elif self.deferrable: +self.defer( +trigger=ClusterWaiterTrigger( +waiter_name="cluster_active", +cluster_arn=cluster_details["clusterArn"], +waiter_delay=self.waiter_delay, +waiter_max_attempts=self.waiter_max_attempts, +aws_conn_id=self.aws_conn_id, +region=self.region, +), +method_name="_complete_exec_with_cluster_desc", +
[GitHub] [airflow] potiuk merged pull request #31881: Deferrable mode for ECS operators
potiuk merged PR #31881: URL: https://github.com/apache/airflow/pull/31881 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on pull request #32004: bump debian to bookworm
potiuk commented on PR #32004: URL: https://github.com/apache/airflow/pull/32004#issuecomment-1604563692 The way to solve it later is to make a PR from "airflow" repo (then image build happens in-ci-workflow and the workflow has enough permissions to be able to push such image to airflow registry. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] AVMusorin opened a new pull request, #32102: Fix reload gunicorn workers
AVMusorin opened a new pull request, #32102: URL: https://github.com/apache/airflow/pull/32102 Since gunicorn can't reload a new code if starts with ``--preload`` setting, we need to check ``reload_on_plugin_change`` before set it up. Gunicorn can't reload a new code because the code is preloaded into the master process and worker are launched with ``fork``, they will still have the old code. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on pull request #32004: bump debian to bookworm
potiuk commented on PR #32004: URL: https://github.com/apache/airflow/pull/32004#issuecomment-1604561636 Yes. Because for non-committers, "build-image" is run from "main" not from your PR and in main you have `PYTHON_BASE_IMAGE=python:3.8-slim-bullseye`. The way to solve it is to push your branch as `main` branch in your fork (and let it run there as "push" workflow (you can also manually build image locally and you will see (I am quite sure) that it won't build. Smth like: ``` git push origin your-branch:main -f ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] pankajastro commented on a diff in pull request #32029: Add deferrable mode in EMR operator and sensor
pankajastro commented on code in PR #32029: URL: https://github.com/apache/airflow/pull/32029#discussion_r1240051624 ## airflow/providers/amazon/aws/sensors/emr.py: ## @@ -424,12 +431,16 @@ def __init__( job_flow_id: str, target_states: Iterable[str] | None = None, failed_states: Iterable[str] | None = None, +deferrable: bool = False, Review Comment: I'm not sure If we followed such a convention but making deferrable param consistent makes sense https://github.com/search?q=repo%3Aapache%2Fairflow+deferrable%3A+bool+path%3A%2F%5Eairflow%5C%2Fproviders%5C%2Famazon%5C%2Faws%5C%2F%2F=code -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[airflow] branch main updated: Adapt Notifier for sla_miss_callback (#31887)
This is an automated email from the ASF dual-hosted git repository. ephraimanierobi pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/airflow.git The following commit(s) were added to refs/heads/main by this push: new e4ca68818e Adapt Notifier for sla_miss_callback (#31887) e4ca68818e is described below commit e4ca68818eec0f29ef04a1a5bfec3241ea03bf8c Author: Utkarsh Sharma AuthorDate: Fri Jun 23 22:31:18 2023 +0530 Adapt Notifier for sla_miss_callback (#31887) * Fix Notifier issue with sla_miss_callback * Add testcase to ensure the old pattern is working * Update document for existing notifier * Made notifier backward compatible with sla_miss_callback * Remove unwanted imports * Fix testcases * Fix static check * Fix PR comments * Removed unwanted code * Remove uneanted change * Updated var name * Remove kwargs * Fix documentation * Fix tests --- airflow/notifications/basenotifier.py | 18 +++-- .../pagerduty_notifier_howto_guide.rst | 5 .../notifications/slack_notifier_howto_guide.rst | 5 .../notifications/smtp_notifier_howto_guide.rst| 5 tests/notifications/test_basenotifier.py | 30 ++ .../apprise/notifications/test_apprise.py | 4 +-- .../pagerduty/notifications/test_pagerduty.py | 4 +-- tests/providers/slack/notifications/test_slack.py | 4 +-- tests/providers/smtp/notifications/test_smtp.py| 4 +-- 9 files changed, 69 insertions(+), 10 deletions(-) diff --git a/airflow/notifications/basenotifier.py b/airflow/notifications/basenotifier.py index 5f922812d4..7ef0603be1 100644 --- a/airflow/notifications/basenotifier.py +++ b/airflow/notifications/basenotifier.py @@ -79,13 +79,27 @@ class BaseNotifier(Templater): """ ... -def __call__(self, context: Context) -> None: +def __call__(self, *args) -> None: """ Send a notification. :param context: The airflow context """ -context = self._update_context(context) +# Currently, there are two ways a callback is invoked +# 1. callback(context) - for on_*_callbacks +# 2. callback(dag, task_list, blocking_task_list, slas, blocking_tis) - for sla_miss_callback +# we have to distinguish between the two calls so that we can prepare the correct context, +if len(args) == 1: +context = args[0] +else: +context = { +"dag": args[0], +"task_list": args[1], +"blocking_task_list": args[2], +"slas": args[3], +"blocking_tis": args[4], +} +self._update_context(context) self.render_template_fields(context) try: self.notify(context) diff --git a/docs/apache-airflow-providers-pagerduty/notifications/pagerduty_notifier_howto_guide.rst b/docs/apache-airflow-providers-pagerduty/notifications/pagerduty_notifier_howto_guide.rst index 2a2388f692..4d6fd4c929 100644 --- a/docs/apache-airflow-providers-pagerduty/notifications/pagerduty_notifier_howto_guide.rst +++ b/docs/apache-airflow-providers-pagerduty/notifications/pagerduty_notifier_howto_guide.rst @@ -23,6 +23,11 @@ Introduction The Pagerduty notifier (:class:`airflow.providers.pagerduty.notifications.pagerduty.PagerdutyNotifier`) allows users to send messages to Pagerduty using the various ``on_*_callbacks`` at both the DAG level and Task level. +You can also use a notifier with ``sla_miss_callback``. + +.. note:: +When notifiers are used with `sla_miss_callback` the context will contain only values passed to the callback, refer :ref:`sla_miss_callback`. + Example Code: - diff --git a/docs/apache-airflow-providers-slack/notifications/slack_notifier_howto_guide.rst b/docs/apache-airflow-providers-slack/notifications/slack_notifier_howto_guide.rst index c6b087398c..d967779cee 100644 --- a/docs/apache-airflow-providers-slack/notifications/slack_notifier_howto_guide.rst +++ b/docs/apache-airflow-providers-slack/notifications/slack_notifier_howto_guide.rst @@ -23,6 +23,11 @@ Introduction Slack notifier (:class:`airflow.providers.slack.notifications.slack.SlackNotifier`) allows users to send messages to a slack channel using the various ``on_*_callbacks`` at both the DAG level and Task level +You can also use a notifier with ``sla_miss_callback``. + +.. note:: +When notifiers are used with `sla_miss_callback` the context will contain only values passed to the callback, refer :ref:`sla_miss_callback`. + Example Code: - diff --git a/docs/apache-airflow-providers-smtp/notifications/smtp_notifier_howto_guide.rst b/docs/apache-airflow-providers-smtp/notifications/smtp_notifier_howto_guide.rst index
[GitHub] [airflow] ephraimbuddy merged pull request #31887: Adapt Notifier for sla_miss_callback
ephraimbuddy merged PR #31887: URL: https://github.com/apache/airflow/pull/31887 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] boring-cyborg[bot] commented on pull request #32101: Fix backfill_job_runner to work with custom executors
boring-cyborg[bot] commented on PR #32101: URL: https://github.com/apache/airflow/pull/32101#issuecomment-1604522311 Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst) Here are some useful points: - Pay attention to the quality of your code (ruff, mypy and type annotations). Our [pre-commits]( https://github.com/apache/airflow/blob/main/STATIC_CODE_CHECKS.rst#prerequisites-for-pre-commit-hooks) will help you with that. - In case of a new feature add useful documentation (in docstrings or in `docs/` directory). Adding a new operator? Check this short [guide](https://github.com/apache/airflow/blob/main/docs/apache-airflow/howto/custom-operator.rst) Consider adding an example DAG that shows how users should use it. - Consider using [Breeze environment](https://github.com/apache/airflow/blob/main/BREEZE.rst) for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations. - Be patient and persistent. It might take some time to get a review or get the final approval from Committers. - Please follow [ASF Code of Conduct](https://www.apache.org/foundation/policies/conduct) for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack. - Be sure to read the [Airflow Coding style]( https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#coding-style-and-best-practices). Apache Airflow is a community-driven project and together we are making it better . In case of doubts contact the developers at: Mailing List: d...@airflow.apache.org Slack: https://s.apache.org/airflow-slack -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] adh-wonolo opened a new pull request, #32101: Fix backfill_job_runner to work with custom executors
adh-wonolo opened a new pull request, #32101: URL: https://github.com/apache/airflow/pull/32101 Backfill Job Runner pulls in the class name of your executor but doesn't pull in the full path so if you aren't using a default core executor you get an error like: ``` File "/Users/adh/.pyenv/versions/3.10.7/envs/airflow310/lib/python3.10/site-packages/airflow/utils/module_loading.py", line 32, in import_string module_path, class_name = dotted_path.rsplit(".", 1) ValueError: not enough values to unpack (expected 2, got 1) During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/Users/adh/.pyenv/versions/3.10.7/envs/airflow310/lib/python3.10/site-packages/airflow/executors/executor_loader.py", line 106, in load_executor executor_cls, import_source = cls.import_executor_cls(executor_name) File "/Users/adh/.pyenv/versions/3.10.7/envs/airflow310/lib/python3.10/site-packages/airflow/executors/executor_loader.py", line 148, in import_executor_cls return _import_and_validate(executor_name), ConnectorSource.CUSTOM_PATH File "/Users/adh/.pyenv/versions/3.10.7/envs/airflow310/lib/python3.10/site-packages/airflow/executors/executor_loader.py", line 129, in _import_and_validate executor = import_string(path) File "/Users/adh/.pyenv/versions/3.10.7/envs/airflow310/lib/python3.10/site-packages/airflow/utils/module_loading.py", line 34, in import_string raise ImportError(f"{dotted_path} doesn't look like a module path") ImportError: CustomExecutor doesn't look like a module path ``` This can be fixed either by just passing in the actual class to `executor_class` or passing in `#f"{self.job.executor.__class__.__module__}.{self.job.executor_class}"` to `ExecutorLoader.import_executor_cls` or setting `self.job.executor_class` to be `#f"{self.job.executor.__class__.__module__}.{self.job.executor.__class__.__name}"` I'm not sure which of these three is the best solution, though in my quick read through the code it seems like this isn't really called elsewhere besides in this specific file. I ran the core tests and they all passed. --- **^ Add meaningful description above** Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#pull-request-guidelines)** for more information. In case of fundamental code changes, an Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvement+Proposals)) is needed. In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x). In case of backwards incompatible changes please leave a note in a newsfragment file, named `{pr_number}.significant.rst` or `{issue_number}.significant.rst`, in [newsfragments](https://github.com/apache/airflow/tree/main/newsfragments). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[airflow] branch constraints-main updated: Updating constraints. Github run id:5357825290
This is an automated email from the ASF dual-hosted git repository. github-bot pushed a commit to branch constraints-main in repository https://gitbox.apache.org/repos/asf/airflow.git The following commit(s) were added to refs/heads/constraints-main by this push: new 63a8e19d8e Updating constraints. Github run id:5357825290 63a8e19d8e is described below commit 63a8e19d8ea4f499041a55dce3eafaa38245f32e Author: Automated GitHub Actions commit AuthorDate: Fri Jun 23 16:14:06 2023 + Updating constraints. Github run id:5357825290 This update in constraints is automatically committed by the CI 'constraints-push' step based on 'refs/heads/main' in the 'apache/airflow' repository with commit sha b156db3a70cca5b3d231c0c49f013fbd0af5d194. The action that build those constraints can be found at https://github.com/apache/airflow/actions/runs/5357825290/ The image tag used for that build was: b156db3a70cca5b3d231c0c49f013fbd0af5d194. You can enter Breeze environment with this image by running 'breeze shell --image-tag b156db3a70cca5b3d231c0c49f013fbd0af5d194' All tests passed in this build so we determined we can push the updated constraints. See https://github.com/apache/airflow/blob/main/README.md#installing-from-pypi for details. --- constraints-3.10.txt | 162 +- constraints-3.11.txt | 160 - constraints-3.8.txt | 162 +- constraints-3.9.txt | 162 +- constraints-no-providers-3.10.txt | 2 +- constraints-no-providers-3.11.txt | 2 +- constraints-no-providers-3.8.txt | 2 +- constraints-no-providers-3.9.txt | 2 +- constraints-source-providers-3.10.txt | 2 +- constraints-source-providers-3.11.txt | 2 +- constraints-source-providers-3.8.txt | 2 +- constraints-source-providers-3.9.txt | 2 +- 12 files changed, 331 insertions(+), 331 deletions(-) diff --git a/constraints-3.10.txt b/constraints-3.10.txt index c021b9824b..1c777a602b 100644 --- a/constraints-3.10.txt +++ b/constraints-3.10.txt @@ -1,5 +1,5 @@ # -# This constraints file was automatically generated on 2023-06-23T14:11:05Z +# This constraints file was automatically generated on 2023-06-23T16:09:54Z # via "eager-upgrade" mechanism of PIP. For the "main" branch of Airflow. # This variant of constraints install uses the HEAD of the branch version for 'apache-airflow' but installs # the providers from PIP-released packages at the moment of the constraint generation. @@ -60,85 +60,85 @@ analytics-python==1.4.post1 ansiwrap==0.8.4 anyascii==0.3.2 anyio==3.7.0 -apache-airflow-providers-airbyte==3.3.0 -apache-airflow-providers-alibaba==2.4.0 -apache-airflow-providers-amazon==8.1.0 -apache-airflow-providers-apache-beam==5.1.0 -apache-airflow-providers-apache-cassandra==3.2.0 -apache-airflow-providers-apache-drill==2.4.0 -apache-airflow-providers-apache-druid==3.4.0 -apache-airflow-providers-apache-flink==1.1.0 -apache-airflow-providers-apache-hdfs==4.0.0 -apache-airflow-providers-apache-hive==6.1.0 -apache-airflow-providers-apache-impala==1.1.0 -apache-airflow-providers-apache-kafka==1.1.0 -apache-airflow-providers-apache-kylin==3.2.0 -apache-airflow-providers-apache-livy==3.5.0 -apache-airflow-providers-apache-pig==4.1.0 -apache-airflow-providers-apache-pinot==4.1.0 -apache-airflow-providers-apache-spark==4.1.0 -apache-airflow-providers-apache-sqoop==3.2.0 -apache-airflow-providers-arangodb==2.2.0 -apache-airflow-providers-asana==2.2.0 -apache-airflow-providers-atlassian-jira==2.1.0 -apache-airflow-providers-celery==3.2.0 -apache-airflow-providers-cloudant==3.2.0 -apache-airflow-providers-cncf-kubernetes==7.0.0 -apache-airflow-providers-common-sql==1.5.1 -apache-airflow-providers-databricks==4.2.0 -apache-airflow-providers-datadog==3.3.0 -apache-airflow-providers-dbt-cloud==3.2.0 -apache-airflow-providers-dingding==3.2.0 -apache-airflow-providers-discord==3.2.0 -apache-airflow-providers-docker==3.7.0 -apache-airflow-providers-elasticsearch==4.5.0 -apache-airflow-providers-exasol==4.2.0 -apache-airflow-providers-facebook==3.2.0 -apache-airflow-providers-ftp==3.4.1 -apache-airflow-providers-github==2.3.0 -apache-airflow-providers-google==10.1.1 -apache-airflow-providers-grpc==3.2.0 -apache-airflow-providers-hashicorp==3.4.0 -apache-airflow-providers-http==4.4.1 -apache-airflow-providers-imap==3.2.1 -apache-airflow-providers-influxdb==2.2.0 -apache-airflow-providers-jdbc==3.4.0 -apache-airflow-providers-jenkins==3.3.0 -apache-airflow-providers-microsoft-azure==6.1.1 -apache-airflow-providers-microsoft-mssql==3.4.0 -apache-airflow-providers-microsoft-psrp==2.3.0 -apache-airflow-providers-microsoft-winrm==3.2.0 -apache-airflow-providers-mongo==3.2.0 -apache-airflow-providers-mysql==5.1.0 -apache-airflow-providers-neo4j==3.3.0
[GitHub] [airflow] eladkal closed issue #32030: Status of testing Providers that were prepared on June 20, 2023
eladkal closed issue #32030: Status of testing Providers that were prepared on June 20, 2023 URL: https://github.com/apache/airflow/issues/32030 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] eladkal commented on issue #32030: Status of testing Providers that were prepared on June 20, 2023
eladkal commented on issue #32030: URL: https://github.com/apache/airflow/issues/32030#issuecomment-1604465569 Thank you everyone. Providers are released I invite everyone to help improve providers for the next release, a list of open issues can be found [here](https://github.com/apache/airflow/issues?q=is%3Aopen+is%3Aissue+label%3Aarea%3Aproviders). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow-site] eladkal merged pull request #819: Add documentation for packages - 2023-06-20
eladkal merged PR #819: URL: https://github.com/apache/airflow-site/pull/819 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org