Re: [PR] add `task_pending_timeout` to `CeleryExecutor` [airflow]

2025-12-06 Thread via GitHub


antonlin1 closed pull request #57832: add `task_pending_timeout` to 
`CeleryExecutor`
URL: https://github.com/apache/airflow/pull/57832


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] add `task_pending_timeout` to `CeleryExecutor` [airflow]

2025-11-05 Thread via GitHub


zach-overflow commented on code in PR #57832:
URL: https://github.com/apache/airflow/pull/57832#discussion_r2496454143


##
providers/celery/src/airflow/providers/celery/executors/celery_executor.py:
##
@@ -443,8 +472,26 @@ def update_task_state(self, key: TaskInstanceKey, state: 
str, info: Any) -> None
 self.success(key, info)
 elif state in (celery_states.FAILURE, celery_states.REVOKED):
 self.fail(key, info)
-elif state in (celery_states.STARTED, celery_states.PENDING, 
celery_states.RETRY):
-pass
+elif state == celery_states.PENDING:
+if self._pending_task_timed_out(key):
+pending_duration = time.monotonic() - 
self.task_pending_since[key]
+self.log.warning(
+"Task %s has been PENDING in Celery for %.1f seconds 
(timeout: %d seconds). "
+"Failing task. This typically indicates the task was 
sent to a non-existent "
+"queue or no workers are available to pick it up.",

Review Comment:
   Any reason for using the `%`-substitution instead of f-strings here?  



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] add `task_pending_timeout` to `CeleryExecutor` [airflow]

2025-11-05 Thread via GitHub


zach-overflow commented on code in PR #57832:
URL: https://github.com/apache/airflow/pull/57832#discussion_r2496449078


##
providers/celery/src/airflow/providers/celery/executors/celery_executor.py:
##
@@ -435,6 +438,32 @@ def change_state(
 ) -> None:
 super().change_state(key, state, info, remove_running=remove_running)
 self.tasks.pop(key, None)
+self.task_pending_since.pop(key, None)
+
+def _pending_task_timed_out(self, key: TaskInstanceKey) -> bool:
+"""
+Check if a task has been stuck in celery_states.PENDING state beyond 
the timeout threshold.
+
+This to guard against task leak scenario:
+ 1. Celery executor sends task to a specific celery queue
+ 2. Celery worker is unavailable for the given queue. Celery task is 
in PENDING state.

Review Comment:
   maybe to clarify this a bit, and how this failure mode can occur even if 
workers are technically "available"
   ```suggestion
2. No Celery workers are consuming from the given queue. Celery 
task will remain in PENDING state.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]