Re: [I] Airflow not retrying Zombie even after detection [airflow]

2025-01-16 Thread via GitHub


rcheatham-q commented on issue #42135:
URL: https://github.com/apache/airflow/issues/42135#issuecomment-2596263452

   Can we reopen this? I just encountered this issue on version 2.10.4 and I'm 
confident I can reliably reproduce it. The high-level steps would be:
   
   1. Deploy Airflow in k8s with a task that runs for a few minutes (it can 
simply sleep) and has at least 1 retry configured. I don't think executor 
matters, it could be Celery or Kubernetes.
   1. While the task is running, forcefully kill the pod running the task 
without a termination grace period
   
  ```
  kubectl delete --grace-period=1 pod/...
  ```
   
   1. Wait for the scheduler to detect the zombie task and trigger a retry
   
   ### Expected behavior:
   The scheduler detects the zombie task and schedules a retry
   
   ### Actual behavior:
   The scheduler detects the zombie task and marks it as failed


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [I] Airflow not retrying Zombie even after detection [airflow]

2024-12-24 Thread via GitHub


github-actions[bot] closed issue #42135: Airflow not retrying Zombie even after 
detection
URL: https://github.com/apache/airflow/issues/42135


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [I] Airflow not retrying Zombie even after detection [airflow]

2024-12-24 Thread via GitHub


github-actions[bot] commented on issue #42135:
URL: https://github.com/apache/airflow/issues/42135#issuecomment-2561502186

   This issue has been closed because it has not received response from the 
issue author.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [I] Airflow not retrying Zombie even after detection [airflow]

2024-12-16 Thread via GitHub


github-actions[bot] commented on issue #42135:
URL: https://github.com/apache/airflow/issues/42135#issuecomment-2547213268

   This issue has been automatically marked as stale because it has been open 
for 14 days with no response from the author. It will be closed in next 7 days 
if no further activity occurs from the issue author.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [I] Airflow not retrying Zombie even after detection [airflow]

2024-12-02 Thread via GitHub


potiuk commented on issue #42135:
URL: https://github.com/apache/airflow/issues/42135#issuecomment-2511200764

   I think one of the reasons that could cause that have been fixed in 2.10.3 
https://github.com/apache/airflow/pull/42932 -> can you please upgrade and see 
if your problems is gone @darren-stults-sp @vaibhavnsingh @kand617  -> this is 
is the easiest way to check it (and upgrading to latest Airflow version is a 
good idea regardless).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [I] Airflow not retrying Zombie even after detection [airflow]

2024-11-20 Thread via GitHub


darren-stults-sp commented on issue #42135:
URL: https://github.com/apache/airflow/issues/42135#issuecomment-2489723964

   +1
   
   On `Airflow v2.10.1`
   
   Seeing a similar thing where the zombie detections happen multiple times for 
a single retry, based off of the code I should expect the job to be terminated 
but the zombie detector log event (when it appears) seems to reappear every 12 
or so seconds for about 8 to 20 repeats.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [I] Airflow not retrying Zombie even after detection [airflow]

2024-09-12 Thread via GitHub


vaibhavnsingh commented on issue #42135:
URL: https://github.com/apache/airflow/issues/42135#issuecomment-2346746162

   +1 
   
   In our current setup, we are using Celery workers as Airflow workers, and we 
have applied a memory limit on these workers as per our DevOps guidelines. When 
a Celery worker exceeds its memory limit, it encounters an Out-Of-Memory (OOM) 
error and restarts. This behavior leads to tasks that were in a running state 
becoming zombie tasks, which the Airflow scheduler detects.
   
   However, we have observed that despite the scheduler detecting these tasks 
as zombie tasks, Airflow does not mark them as failed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [I] Airflow not retrying Zombie even after detection [airflow]

2024-09-10 Thread via GitHub


boring-cyborg[bot] commented on issue #42135:
URL: https://github.com/apache/airflow/issues/42135#issuecomment-2341173130

   Thanks for opening your first issue here! Be sure to follow the issue 
template! If you are willing to raise PR to address this issue please do so, no 
need to wait for approval.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]