Re: [I] Airflow not retrying Zombie even after detection [airflow]
rcheatham-q commented on issue #42135: URL: https://github.com/apache/airflow/issues/42135#issuecomment-2596263452 Can we reopen this? I just encountered this issue on version 2.10.4 and I'm confident I can reliably reproduce it. The high-level steps would be: 1. Deploy Airflow in k8s with a task that runs for a few minutes (it can simply sleep) and has at least 1 retry configured. I don't think executor matters, it could be Celery or Kubernetes. 1. While the task is running, forcefully kill the pod running the task without a termination grace period ``` kubectl delete --grace-period=1 pod/... ``` 1. Wait for the scheduler to detect the zombie task and trigger a retry ### Expected behavior: The scheduler detects the zombie task and schedules a retry ### Actual behavior: The scheduler detects the zombie task and marks it as failed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [I] Airflow not retrying Zombie even after detection [airflow]
github-actions[bot] closed issue #42135: Airflow not retrying Zombie even after detection URL: https://github.com/apache/airflow/issues/42135 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [I] Airflow not retrying Zombie even after detection [airflow]
github-actions[bot] commented on issue #42135: URL: https://github.com/apache/airflow/issues/42135#issuecomment-2561502186 This issue has been closed because it has not received response from the issue author. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [I] Airflow not retrying Zombie even after detection [airflow]
github-actions[bot] commented on issue #42135: URL: https://github.com/apache/airflow/issues/42135#issuecomment-2547213268 This issue has been automatically marked as stale because it has been open for 14 days with no response from the author. It will be closed in next 7 days if no further activity occurs from the issue author. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [I] Airflow not retrying Zombie even after detection [airflow]
potiuk commented on issue #42135: URL: https://github.com/apache/airflow/issues/42135#issuecomment-2511200764 I think one of the reasons that could cause that have been fixed in 2.10.3 https://github.com/apache/airflow/pull/42932 -> can you please upgrade and see if your problems is gone @darren-stults-sp @vaibhavnsingh @kand617 -> this is is the easiest way to check it (and upgrading to latest Airflow version is a good idea regardless). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [I] Airflow not retrying Zombie even after detection [airflow]
darren-stults-sp commented on issue #42135: URL: https://github.com/apache/airflow/issues/42135#issuecomment-2489723964 +1 On `Airflow v2.10.1` Seeing a similar thing where the zombie detections happen multiple times for a single retry, based off of the code I should expect the job to be terminated but the zombie detector log event (when it appears) seems to reappear every 12 or so seconds for about 8 to 20 repeats. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [I] Airflow not retrying Zombie even after detection [airflow]
vaibhavnsingh commented on issue #42135: URL: https://github.com/apache/airflow/issues/42135#issuecomment-2346746162 +1 In our current setup, we are using Celery workers as Airflow workers, and we have applied a memory limit on these workers as per our DevOps guidelines. When a Celery worker exceeds its memory limit, it encounters an Out-Of-Memory (OOM) error and restarts. This behavior leads to tasks that were in a running state becoming zombie tasks, which the Airflow scheduler detects. However, we have observed that despite the scheduler detecting these tasks as zombie tasks, Airflow does not mark them as failed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [I] Airflow not retrying Zombie even after detection [airflow]
boring-cyborg[bot] commented on issue #42135: URL: https://github.com/apache/airflow/issues/42135#issuecomment-2341173130 Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
