Hello,

I'm running Airflow v2.5.3 on a Dockerized setup. Here I have a strange
issue with the SSH timeout  for a rsync command.

Here not sure only for the first try I'm getting this SSH timeout error.
after the first retry there is no timeout error. This happens for all the
runs. Please advise.

I can conform there is no network issue. And I suspect the webserver and
scheduler container are in unhealthy state.



===
rsync_work_to_scratch_command = """
        rsync -av "{{ dag_run.conf["work_dir"] }}/" "{{
dag_run.conf["scratch_dir"] }}/"
"""

rsync_work_to_scratch_task = SSHOperator(
    task_id='rsync_work_to_scratch',
    ssh_hook=ssh_hook,
    command=rsync_work_to_scratch_command,
    on_success_callback=None,
    on_failure_callback=update_failure,
    retries=15,
    dag=dag
)
===

Error log for the SSH timeout.
===
[2023-07-12T14:47:46.179+0400] {ssh.py:516} INFO -
Data/Intensities/BaseCalls/L001/C1.1/L001_2.cbcl
[2023-07-12T14:47:56.202+0400] {taskinstance.py:1776} ERROR - Task failed
with exception
Traceback (most recent call last):
  File
"/home/airflow/.local/lib/python3.7/site-packages/airflow/providers/ssh/operators/ssh.py",
line 173, in execute
    result = self.run_ssh_client_command(ssh_client, self.command,
context=context)
  File
"/home/airflow/.local/lib/python3.7/site-packages/airflow/providers/ssh/operators/ssh.py",
line 159, in run_ssh_client_command
    ssh_client, command, environment=self.environment, get_pty=self.get_pty
  File
"/home/airflow/.local/lib/python3.7/site-packages/airflow/providers/ssh/hooks/ssh.py",
line 541, in exec_ssh_client_command
    raise AirflowException("SSH command timed out")
airflow.exceptions.AirflowException: SSH command timed out
[2023-07-12T14:47:56.216+0400] {taskinstance.py:1332} INFO - Marking task
as UP_FOR_RETRY. dag_id=10X_sequence, task_id=rsync_work_to_scratch,
execution_date=20230712T104644, start_date=20230712T104647,
end_date=20230712T104756
[2023-07-12T14:47:56.230+0400] {standard_task_runner.py:105} ERROR - Failed
to execute job 82 for task rsync_work_to_scratch (SSH command timed out;
12951)
[2023-07-12T14:47:56.260+0400] {local_task_job.py:212} INFO - Task exited
with return code 1
[2023-07-12T14:47:56.339+0400] {taskinstance.py:2596} INFO - 0 downstream
tasks scheduled from follow-on schedule check
====

Thanks
Jay

Reply via email to