Re: [PR] add `task_pending_timeout` to `CeleryExecutor` [airflow]
antonlin1 closed pull request #57832: add `task_pending_timeout` to `CeleryExecutor` URL: https://github.com/apache/airflow/pull/57832 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] add `task_pending_timeout` to `CeleryExecutor` [airflow]
zach-overflow commented on code in PR #57832: URL: https://github.com/apache/airflow/pull/57832#discussion_r2496454143 ## providers/celery/src/airflow/providers/celery/executors/celery_executor.py: ## @@ -443,8 +472,26 @@ def update_task_state(self, key: TaskInstanceKey, state: str, info: Any) -> None self.success(key, info) elif state in (celery_states.FAILURE, celery_states.REVOKED): self.fail(key, info) -elif state in (celery_states.STARTED, celery_states.PENDING, celery_states.RETRY): -pass +elif state == celery_states.PENDING: +if self._pending_task_timed_out(key): +pending_duration = time.monotonic() - self.task_pending_since[key] +self.log.warning( +"Task %s has been PENDING in Celery for %.1f seconds (timeout: %d seconds). " +"Failing task. This typically indicates the task was sent to a non-existent " +"queue or no workers are available to pick it up.", Review Comment: Any reason for using the `%`-substitution instead of f-strings here? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] add `task_pending_timeout` to `CeleryExecutor` [airflow]
zach-overflow commented on code in PR #57832: URL: https://github.com/apache/airflow/pull/57832#discussion_r2496449078 ## providers/celery/src/airflow/providers/celery/executors/celery_executor.py: ## @@ -435,6 +438,32 @@ def change_state( ) -> None: super().change_state(key, state, info, remove_running=remove_running) self.tasks.pop(key, None) +self.task_pending_since.pop(key, None) + +def _pending_task_timed_out(self, key: TaskInstanceKey) -> bool: +""" +Check if a task has been stuck in celery_states.PENDING state beyond the timeout threshold. + +This to guard against task leak scenario: + 1. Celery executor sends task to a specific celery queue + 2. Celery worker is unavailable for the given queue. Celery task is in PENDING state. Review Comment: maybe to clarify this a bit, and how this failure mode can occur even if workers are technically "available" ```suggestion 2. No Celery workers are consuming from the given queue. Celery task will remain in PENDING state. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
