Hi Folks,
To close this one off I want to mention some more information we were able to 
acquire. This may help, in particular people running Airflow on K8s.
If you define a custom XCom backend in your values.yaml configuration and 
Airflow fails to load the class, the entire Chart deployment will fail with 
each pod container attempting to restart time and time again.
The problem is that it is very difficult to acquire logs from the container 
because there is a very small window of availability where the trace can be 
obtained. If you are fortunate enough to query the container logs at the right 
time, you will see something similar to the following

Traceback (most recent call last):
  File "/home/airflow/.local/bin/airflow", line 8, in <module>
    sys.exit(main())
  File "/home/airflow/.local/lib/python3.9/site-packages/airflow/__main__.py", 
line 48, in main
    args.func(args)
  File 
"/home/airflow/.local/lib/python3.9/site-packages/airflow/cli/cli_parser.py", 
line 47, in command
    func = import_string(import_path)
  File 
"/home/airflow/.local/lib/python3.9/site-packages/airflow/utils/module_loading.py",
 line 32, in import_string
    module = import_module(module_path)
  File "/usr/local/lib/python3.9/importlib/__init__.py", line 127, in 
import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
  File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 680, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 850, in exec_module
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
  File 
"/home/airflow/.local/lib/python3.9/site-packages/airflow/cli/commands/db_command.py",
 line 24, in <module>
    from airflow.utils import cli as cli_utils, db
  File "/home/airflow/.local/lib/python3.9/site-packages/airflow/utils/db.py", 
line 27, in <module>
    from airflow.jobs.base_job import BaseJob  # noqa: F401
  File 
"/home/airflow/.local/lib/python3.9/site-packages/airflow/jobs/__init__.py", 
line 19, in <module>
    import airflow.jobs.backfill_job
  File 
"/home/airflow/.local/lib/python3.9/site-packages/airflow/jobs/backfill_job.py",
 line 28, in <module>
    from airflow import models
  File 
"/home/airflow/.local/lib/python3.9/site-packages/airflow/models/__init__.py", 
line 20, in <module>
    from airflow.models.baseoperator import BaseOperator, BaseOperatorLink
  File 
"/home/airflow/.local/lib/python3.9/site-packages/airflow/models/baseoperator.py",
 line 61, in <module>
    from airflow.models.taskinstance import Context, TaskInstance, 
clear_task_instances
  File 
"/home/airflow/.local/lib/python3.9/site-packages/airflow/models/taskinstance.py",
 line 82, in <module>
    from airflow.models.xcom import XCOM_RETURN_KEY, XCom
  File 
"/home/airflow/.local/lib/python3.9/site-packages/airflow/models/xcom.py", line 
379, in <module>
    XCom = resolve_xcom_backend()
  File 
"/home/airflow/.local/lib/python3.9/site-packages/airflow/models/xcom.py", line 
369, in resolve_xcom_backend
    clazz = conf.getimport("core", "xcom_backend", 
fallback=f"airflow.models.xcom.{BaseXCom.__name__}")
  File 
"/home/airflow/.local/lib/python3.9/site-packages/airflow/configuration.py", 
line 485, in getimport
    raise AirflowConfigException(
airflow.exceptions.AirflowConfigException: The object could not be loaded. 
Please check "xcom_backend" key in "core" section. Current value: 
"xcom_custom_backend.S3XComBackend".
[2022-01-06 00:02:16,880] {settings.py:331} DEBUG - Disposing DB connection 
pool (PID 214)

As you can see, clearly in this example the path to the customer xcom is 
incorrect. 

I am going to propose an improvement to xcom.resolve_xcom_backend() which 
validates the custom XCom backend before returning the value to be interpreted 
by configuration. In the case that the xcom value is invalid, we simply fall 
back to airflow.models.xcom.{BaseXCom.__name__}.
This way we can catch things like incorrect paths, etc. before they lead to 
cryptic, difficult to catch deployment failures. 

I'll start working on the documentation patch and then 
xcom.resolve_xcom_backend() and unit tests.

On 2022/01/05 23:19:09 Daniel Standish wrote:
> Looks like you replied just before me.
> 
> You should not need to do anything beyond confirming that airflow config
> resolves the right xcom.  Its usage should be  the same.  E.g. ti.xcom_push
> etc.
> 
> Your  dags and tasks should remain unchanged.
> 

Reply via email to