Hi all,
Is there a way to get the Airflow Worker (started by Celery) to retry
connecting to the Metadata DB by default when it times out?
And if it's related what does setting the worker precheck do when set to True?
Will it retry connections if fails?
Details of the issue:
I'm currently on Airflow 1.10.6 using CeleryExecutor with Redis and MySQL DB,
recently been getting a few tasks failing before they start. Airflow sends out
an email that says:
Executor reports task instance finished (failed) although the task says its
queued. Was the task killed externally?
Digging in to the Airflow Worker Stderr I see the exception:
[2020-02-24 06:08:14,718: INFO/ForkPoolWorker-9] Executing command in Celery:
['airflow', 'run', 'my_dag_id', 'my_task_id', '2020-02-23T10:00:00+00:00',
'--local', '--pool', 'default_pool', '-sd', '.../dag_creator.py']
[2020-02-24 06:08:26,986: ERROR/ForkPoolWorker-9] execute_command encountered a
CalledProcessError
Traceback (most recent call last):
File ".../lib/python3.7/site-packages/airflow/executors/celery_executor.py",
line 67, in execute_command
close_fds=True, env=env)
File ".../lib/python3.7/subprocess.py", line 363, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['airflow', 'run', 'my_dag_id',
'my_task_id', '2020-02-23T10:00:00+00:00', '--local', '--pool', 'default_pool',
'-sd', '.../dag_creator.py']' returned non-zero exit status 1.
And digging in to the Airflow Worker Stdout at this time I see a disconnection
to the Metadata DB:
[2020-02-24 06:08:16,620] {cli.py:545} INFO - Running <TaskInstance:
my_dag_id.my_task_id 2020-02-23T10:00:00+00:00 [queued]> on host
my_app_host.my.internal_domain.net
Traceback (most recent call last):
File
"/home/qthft/.conda/envs/qt_data_airflow_106/lib/python3.7/site-packages/pymysql/connections.py",
line 583, in connect
**kwargs)
File "/home/qthft/.conda/envs/qt_data_airflow_106/lib/python3.7/socket.py",
line 727, in create_connection
raise err
File "/home/qthft/.conda/envs/qt_data_airflow_106/lib/python3.7/socket.py",
line 716, in create_connection
sock.connect(sa)
socket.timeout: timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File
"/home/qthft/.conda/envs/qt_data_airflow_106/lib/python3.7/site-packages/sqlalchemy/engine/base.py",
line 2276, in _wrap_pool_connect
return fn()
File
"/home/qthft/.conda/envs/qt_data_airflow_106/lib/python3.7/site-packages/sqlalchemy/pool/base.py",
line 363, in connect
return _ConnectionFairy._checkout(self)
File
"/home/qthft/.conda/envs/qt_data_airflow_106/lib/python3.7/site-packages/sqlalchemy/pool/base.py",
line 760, in _checkout
fairy = _ConnectionRecord.checkout(pool)
File
"/home/qthft/.conda/envs/qt_data_airflow_106/lib/python3.7/site-packages/sqlalchemy/pool/base.py",
line 492, in checkout
rec = pool._do_get()
File
"/home/qthft/.conda/envs/qt_data_airflow_106/lib/python3.7/site-packages/sqlalchemy/pool/impl.py",
line 238, in _do_get
return self._create_connection()
File
"/home/qthft/.conda/envs/qt_data_airflow_106/lib/python3.7/site-packages/sqlalchemy/pool/base.py",
line 308, in _create_connection
return _ConnectionRecord(self)
File
"/home/qthft/.conda/envs/qt_data_airflow_106/lib/python3.7/site-packages/sqlalchemy/pool/base.py",
line 437, in __init__
self.__connect(first_connect_check=True)
File
"/home/qthft/.conda/envs/qt_data_airflow_106/lib/python3.7/site-packages/sqlalchemy/pool/base.py",
line 639, in __connect
connection = pool._invoke_creator(self)
File
"/home/qthft/.conda/envs/qt_data_airflow_106/lib/python3.7/site-packages/sqlalchemy/engine/strategies.py",
line 114, in connect
return dialect.connect(*cargs, **cparams)
File
"/home/qthft/.conda/envs/qt_data_airflow_106/lib/python3.7/site-packages/sqlalchemy/engine/default.py",
line 482, in connect
return self.dbapi.connect(*cargs, **cparams)
File
"/home/qthft/.conda/envs/qt_data_airflow_106/lib/python3.7/site-packages/pymysql/__init__.py",
line 94, in Connect
return Connection(*args, **kwargs)
File
"/home/qthft/.conda/envs/qt_data_airflow_106/lib/python3.7/site-packages/pymysql/connections.py",
line 325, in __init__
self.connect()
File
"/home/qthft/.conda/envs/qt_data_airflow_106/lib/python3.7/site-packages/pymysql/connections.py",
line 630, in connect
raise exc
pymysql.err.OperationalError: (2003, "Can't connect to MySQL server on
'my_db_host.my.internal_domain.net' (timed out)")
Any help is appreciated.
Regards
Damian
===============================================================================
Please access the attached hyperlink for an important electronic communications
disclaimer:
http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html
===============================================================================