[PR] The task is stuck in a queued state forever in case of pod launch errors [airflow]

2024-01-18 Thread via GitHub
dirrao opened a new pull request, #36882: URL: https://github.com/apache/airflow/pull/36882 What happened When the K8 executor is unable to launch the worker pod due to permissions issues or an invalid namespace. The K8 executor keep trying to launch the worker pod and the errors rem

Re: [PR] The task is stuck in a queued state forever in case of pod launch errors [airflow]

2024-01-19 Thread via GitHub
shohamy7 commented on code in PR #36882: URL: https://github.com/apache/airflow/pull/36882#discussion_r1459077084 ## airflow/providers/cncf/kubernetes/executors/kubernetes_executor.py: ## @@ -436,7 +436,7 @@ def sync(self) -> None: except ApiException as e:

Re: [PR] The task is stuck in a queued state forever in case of pod launch errors [airflow]

2024-01-19 Thread via GitHub
dirrao commented on code in PR #36882: URL: https://github.com/apache/airflow/pull/36882#discussion_r1459273598 ## airflow/providers/cncf/kubernetes/executors/kubernetes_executor.py: ## @@ -436,7 +436,7 @@ def sync(self) -> None: except ApiException as e:

Re: [PR] The task is stuck in a queued state forever in case of pod launch errors [airflow]

2024-01-19 Thread via GitHub
eladkal commented on PR #36882: URL: https://github.com/apache/airflow/pull/36882#issuecomment-1901185211 Just to clarify this also solves https://github.com/apache/airflow/issues/35792 ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] The task is stuck in a queued state forever in case of pod launch errors [airflow]

2024-01-19 Thread via GitHub
jedcunningham commented on PR #36882: URL: https://github.com/apache/airflow/pull/36882#issuecomment-1901186517 > Just to clarify this also solves #35792 ? Yes, it would. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] The task is stuck in a queued state forever in case of pod launch errors [airflow]

2024-01-19 Thread via GitHub
jedcunningham commented on PR #36882: URL: https://github.com/apache/airflow/pull/36882#issuecomment-1901190184 It might be worth adding a note in the changelog about this behavior change, so folks can reevaluate if they need to enable/increase retries. -- This is an automated message fro

Re: [PR] The task is stuck in a queued state forever in case of pod launch errors [airflow]

2024-01-19 Thread via GitHub
eladkal commented on PR #36882: URL: https://github.com/apache/airflow/pull/36882#issuecomment-1901194151 > It might be worth adding a note in the changelog about this behavior change, so folks can reevaluate if they need to enable/increase retries. Agree. @dirrao can you please add n

Re: [PR] The task is stuck in a queued state forever in case of pod launch errors [airflow]

2024-01-19 Thread via GitHub
hussein-awala commented on code in PR #36882: URL: https://github.com/apache/airflow/pull/36882#discussion_r1460108973 ## airflow/providers/cncf/kubernetes/executors/kubernetes_executor.py: ## @@ -434,9 +434,9 @@ def sync(self) -> None: )

Re: [PR] The task is stuck in a queued state forever in case of pod launch errors [airflow]

2024-01-19 Thread via GitHub
dirrao commented on PR #36882: URL: https://github.com/apache/airflow/pull/36882#issuecomment-1901747231 > > It might be worth adding a note in the changelog about this behavior change, so folks can reevaluate if they need to enable/increase retries. > > Agree. @dirrao can you please

Re: [PR] The task is stuck in a queued state forever in case of pod launch errors [airflow]

2024-01-19 Thread via GitHub
dirrao commented on code in PR #36882: URL: https://github.com/apache/airflow/pull/36882#discussion_r1460214073 ## airflow/providers/cncf/kubernetes/executors/kubernetes_executor.py: ## @@ -434,9 +434,9 @@ def sync(self) -> None: ) self.

Re: [PR] The task is stuck in a queued state forever in case of pod launch errors [airflow]

2024-01-19 Thread via GitHub
dirrao commented on code in PR #36882: URL: https://github.com/apache/airflow/pull/36882#discussion_r1460214073 ## airflow/providers/cncf/kubernetes/executors/kubernetes_executor.py: ## @@ -434,9 +434,9 @@ def sync(self) -> None: ) self.

Re: [PR] The task is stuck in a queued state forever in case of pod launch errors [airflow]

2024-01-19 Thread via GitHub
dirrao commented on code in PR #36882: URL: https://github.com/apache/airflow/pull/36882#discussion_r1460214073 ## airflow/providers/cncf/kubernetes/executors/kubernetes_executor.py: ## @@ -434,9 +434,9 @@ def sync(self) -> None: ) self.

Re: [PR] The task is stuck in a queued state forever in case of pod launch errors [airflow]

2024-01-20 Thread via GitHub
shohamy7 commented on code in PR #36882: URL: https://github.com/apache/airflow/pull/36882#discussion_r1460536555 ## airflow/providers/cncf/kubernetes/executors/kubernetes_executor.py: ## @@ -434,9 +434,9 @@ def sync(self) -> None: ) sel

Re: [PR] The task is stuck in a queued state forever in case of pod launch errors [airflow]

2024-01-20 Thread via GitHub
hussein-awala commented on code in PR #36882: URL: https://github.com/apache/airflow/pull/36882#discussion_r1460539740 ## airflow/providers/cncf/kubernetes/executors/kubernetes_executor.py: ## @@ -434,9 +434,9 @@ def sync(self) -> None: )

Re: [PR] The task is stuck in a queued state forever in case of pod launch errors [airflow]

2024-01-20 Thread via GitHub
dirrao commented on code in PR #36882: URL: https://github.com/apache/airflow/pull/36882#discussion_r1460550276 ## airflow/providers/cncf/kubernetes/executors/kubernetes_executor.py: ## @@ -434,9 +434,9 @@ def sync(self) -> None: ) self.

Re: [PR] The task is stuck in a queued state forever in case of pod launch errors [airflow]

2024-01-22 Thread via GitHub
jedcunningham commented on code in PR #36882: URL: https://github.com/apache/airflow/pull/36882#discussion_r1462584433 ## airflow/providers/cncf/kubernetes/executors/kubernetes_executor.py: ## @@ -434,9 +434,9 @@ def sync(self) -> None: )

Re: [PR] The task is stuck in a queued state forever in case of pod launch errors [airflow]

2024-01-22 Thread via GitHub
dirrao commented on code in PR #36882: URL: https://github.com/apache/airflow/pull/36882#discussion_r1462668330 ## airflow/providers/cncf/kubernetes/executors/kubernetes_executor.py: ## @@ -434,9 +434,9 @@ def sync(self) -> None: ) self.

Re: [PR] The task is stuck in a queued state forever in case of pod launch errors [airflow]

2024-01-23 Thread via GitHub
hussein-awala commented on code in PR #36882: URL: https://github.com/apache/airflow/pull/36882#discussion_r1463107125 ## airflow/providers/cncf/kubernetes/executors/kubernetes_executor.py: ## @@ -434,9 +434,9 @@ def sync(self) -> None: )

Re: [PR] The task is stuck in a queued state forever in case of pod launch errors [airflow]

2024-01-23 Thread via GitHub
dirrao commented on code in PR #36882: URL: https://github.com/apache/airflow/pull/36882#discussion_r1463426729 ## airflow/providers/cncf/kubernetes/executors/kubernetes_executor.py: ## @@ -434,9 +434,9 @@ def sync(self) -> None: ) self.

Re: [PR] The task is stuck in a queued state forever in case of pod launch errors [airflow]

2024-01-23 Thread via GitHub
shohamy7 commented on code in PR #36882: URL: https://github.com/apache/airflow/pull/36882#discussion_r1463452920 ## airflow/providers/cncf/kubernetes/executors/kubernetes_executor.py: ## @@ -434,9 +434,9 @@ def sync(self) -> None: ) sel

Re: [PR] The task is stuck in a queued state forever in case of pod launch errors [airflow]

2024-01-23 Thread via GitHub
dirrao commented on code in PR #36882: URL: https://github.com/apache/airflow/pull/36882#discussion_r1463613047 ## airflow/providers/cncf/kubernetes/executors/kubernetes_executor.py: ## @@ -434,9 +434,9 @@ def sync(self) -> None: ) self.

Re: [PR] The task is stuck in a queued state forever in case of pod launch errors [airflow]

2024-01-23 Thread via GitHub
dirrao commented on code in PR #36882: URL: https://github.com/apache/airflow/pull/36882#discussion_r1463613047 ## airflow/providers/cncf/kubernetes/executors/kubernetes_executor.py: ## @@ -434,9 +434,9 @@ def sync(self) -> None: ) self.

Re: [PR] The task is stuck in a queued state forever in case of pod launch errors [airflow]

2024-01-23 Thread via GitHub
amoghrajesh commented on code in PR #36882: URL: https://github.com/apache/airflow/pull/36882#discussion_r1464287734 ## airflow/providers/cncf/kubernetes/executors/kubernetes_executor.py: ## @@ -434,9 +434,9 @@ def sync(self) -> None: )

Re: [PR] The task is stuck in a queued state forever in case of pod launch errors [airflow]

2024-01-24 Thread via GitHub
dirrao commented on code in PR #36882: URL: https://github.com/apache/airflow/pull/36882#discussion_r1465782241 ## airflow/providers/cncf/kubernetes/executors/kubernetes_executor.py: ## @@ -434,9 +434,9 @@ def sync(self) -> None: ) self.

Re: [PR] The task is stuck in a queued state forever in case of pod launch errors [airflow]

2024-01-25 Thread via GitHub
jedcunningham commented on code in PR #36882: URL: https://github.com/apache/airflow/pull/36882#discussion_r1466816732 ## airflow/providers/cncf/kubernetes/executors/kubernetes_executor.py: ## @@ -434,9 +434,9 @@ def sync(self) -> None: )

Re: [PR] The task is stuck in a queued state forever in case of pod launch errors [airflow]

2024-01-25 Thread via GitHub
jedcunningham commented on code in PR #36882: URL: https://github.com/apache/airflow/pull/36882#discussion_r1466816732 ## airflow/providers/cncf/kubernetes/executors/kubernetes_executor.py: ## @@ -434,9 +434,9 @@ def sync(self) -> None: )

Re: [PR] The task is stuck in a queued state forever in case of pod launch errors [airflow]

2024-01-27 Thread via GitHub
hussein-awala commented on code in PR #36882: URL: https://github.com/apache/airflow/pull/36882#discussion_r1468587468 ## airflow/providers/cncf/kubernetes/executors/kubernetes_executor.py: ## @@ -434,9 +434,9 @@ def sync(self) -> None: )

Re: [PR] The task is stuck in a queued state forever in case of pod launch errors [airflow]

2024-01-28 Thread via GitHub
hterik commented on code in PR #36882: URL: https://github.com/apache/airflow/pull/36882#discussion_r1469164669 ## airflow/providers/cncf/kubernetes/executors/kubernetes_executor.py: ## @@ -434,9 +434,9 @@ def sync(self) -> None: ) self.

Re: [PR] The task is stuck in a queued state forever in case of pod launch errors [airflow]

2024-01-28 Thread via GitHub
dirrao commented on code in PR #36882: URL: https://github.com/apache/airflow/pull/36882#discussion_r1469165496 ## airflow/providers/cncf/kubernetes/executors/kubernetes_executor.py: ## @@ -434,9 +434,9 @@ def sync(self) -> None: ) self.

Re: [PR] The task is stuck in a queued state forever in case of pod launch errors [airflow]

2024-01-29 Thread via GitHub
chenyair commented on PR #36882: URL: https://github.com/apache/airflow/pull/36882#issuecomment-1914986603 I see you added retires counter. What do you think about custom delay between each retry of exceeded quota also? My issue is the high rate of requests to Kubernetes API and currently i

Re: [PR] The task is stuck in a queued state forever in case of pod launch errors [airflow]

2024-01-29 Thread via GitHub
dirrao commented on PR #36882: URL: https://github.com/apache/airflow/pull/36882#issuecomment-1915929151 > I see you added retires counter. What do you think about custom delay between each retry of exceeded quota also? My issue is the high rate of requests to Kubernetes API and currently i

Re: [PR] The task is stuck in a queued state forever in case of pod launch errors [airflow]

2024-01-29 Thread via GitHub
jedcunningham commented on code in PR #36882: URL: https://github.com/apache/airflow/pull/36882#discussion_r1470543337 ## airflow/providers/cncf/kubernetes/provider.yaml: ## @@ -350,6 +350,15 @@ config: type: string example: ~ default: "" + task_p

Re: [PR] The task is stuck in a queued state forever in case of pod launch errors [airflow]

2024-01-29 Thread via GitHub
jedcunningham commented on PR #36882: URL: https://github.com/apache/airflow/pull/36882#issuecomment-1916006948 > > I see you added retires counter. What do you think about custom delay between each retry of exceeded quota also? My issue is the high rate of requests to Kubernetes API and cu

Re: [PR] The task is stuck in a queued state forever in case of pod launch errors [airflow]

2024-01-30 Thread via GitHub
dirrao commented on PR #36882: URL: https://github.com/apache/airflow/pull/36882#issuecomment-1916298950 > > > I see you added retires counter. What do you think about custom delay between each retry of exceeded quota also? My issue is the high rate of requests to Kubernetes API and current

Re: [PR] The task is stuck in a queued state forever in case of pod launch errors [airflow]

2024-01-30 Thread via GitHub
dirrao commented on code in PR #36882: URL: https://github.com/apache/airflow/pull/36882#discussion_r1470762084 ## airflow/providers/cncf/kubernetes/provider.yaml: ## @@ -350,6 +350,15 @@ config: type: string example: ~ default: "" + task_publish_

Re: [PR] The task is stuck in a queued state forever in case of pod launch errors [airflow]

2024-01-30 Thread via GitHub
dirrao commented on code in PR #36882: URL: https://github.com/apache/airflow/pull/36882#discussion_r1470762084 ## airflow/providers/cncf/kubernetes/provider.yaml: ## @@ -350,6 +350,15 @@ config: type: string example: ~ default: "" + task_publish_

Re: [PR] The task is stuck in a queued state forever in case of pod launch errors [airflow]

2024-02-01 Thread via GitHub
devscheffer commented on PR #36882: URL: https://github.com/apache/airflow/pull/36882#issuecomment-1921445378 I had similar problems and thought about something like that ``` from airflow.providers.cncf.kubernetes.operators.kubernetes_pod import KubernetesPodOperator from airflo

Re: [PR] The task is stuck in a queued state forever in case of pod launch errors [airflow]

2024-02-01 Thread via GitHub
dirrao commented on PR #36882: URL: https://github.com/apache/airflow/pull/36882#issuecomment-1922753144 > I had similar problems and thought about something like that Ok. I would suggest to use dedicated pool slots per namespace. pool slots should depicts the namespace resources. So,

Re: [PR] The task is stuck in a queued state forever in case of pod launch errors [airflow]

2024-02-02 Thread via GitHub
dirrao commented on code in PR #36882: URL: https://github.com/apache/airflow/pull/36882#discussion_r1476162147 ## airflow/providers/cncf/kubernetes/executors/kubernetes_executor.py: ## @@ -434,9 +434,9 @@ def sync(self) -> None: ) self.

Re: [PR] The task is stuck in a queued state forever in case of pod launch errors [airflow]

2024-02-02 Thread via GitHub
hussein-awala commented on code in PR #36882: URL: https://github.com/apache/airflow/pull/36882#discussion_r1476561460 ## airflow/providers/cncf/kubernetes/executors/kubernetes_executor.py: ## @@ -434,19 +438,35 @@ def sync(self) -> None: )

Re: [PR] The task is stuck in a queued state forever in case of pod launch errors [airflow]

2024-02-02 Thread via GitHub
dirrao commented on code in PR #36882: URL: https://github.com/apache/airflow/pull/36882#discussion_r1476927531 ## airflow/providers/cncf/kubernetes/executors/kubernetes_executor.py: ## @@ -434,19 +438,35 @@ def sync(self) -> None: ) sel

Re: [PR] The task is stuck in a queued state forever in case of pod launch errors [airflow]

2024-02-02 Thread via GitHub
dirrao commented on code in PR #36882: URL: https://github.com/apache/airflow/pull/36882#discussion_r1476927807 ## airflow/providers/cncf/kubernetes/provider.yaml: ## @@ -350,6 +350,15 @@ config: type: string example: ~ default: "" + task_publish_

Re: [PR] The task is stuck in a queued state forever in case of pod launch errors [airflow]

2024-02-03 Thread via GitHub
dirrao commented on code in PR #36882: URL: https://github.com/apache/airflow/pull/36882#discussion_r1477173486 ## airflow/providers/cncf/kubernetes/provider.yaml: ## @@ -350,6 +350,15 @@ config: type: string example: ~ default: "" + task_publish_

Re: [PR] The task is stuck in a queued state forever in case of pod launch errors [airflow]

2024-02-03 Thread via GitHub
dirrao commented on code in PR #36882: URL: https://github.com/apache/airflow/pull/36882#discussion_r1477173614 ## airflow/providers/cncf/kubernetes/executors/kubernetes_executor.py: ## @@ -434,9 +434,9 @@ def sync(self) -> None: ) self.

Re: [PR] The task is stuck in a queued state forever in case of pod launch errors [airflow]

2024-02-03 Thread via GitHub
dirrao commented on code in PR #36882: URL: https://github.com/apache/airflow/pull/36882#discussion_r1477173515 ## airflow/providers/cncf/kubernetes/executors/kubernetes_executor.py: ## @@ -434,19 +438,35 @@ def sync(self) -> None: ) sel

Re: [PR] The task is stuck in a queued state forever in case of pod launch errors [airflow]

2024-02-03 Thread via GitHub
dirrao commented on PR #36882: URL: https://github.com/apache/airflow/pull/36882#issuecomment-1925609810 @potiuk / @hussein-awala PR unrelated celery executor test is failing. is this due to recent change? `tests/integration/executors/test_celery_executor.py::TestCeleryExecutor::test_

Re: [PR] The task is stuck in a queued state forever in case of pod launch errors [airflow]

2024-02-07 Thread via GitHub
dirrao commented on PR #36882: URL: https://github.com/apache/airflow/pull/36882#issuecomment-1932418167 @hussein-awala Can you re-review it? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] The task is stuck in a queued state forever in case of pod launch errors [airflow]

2024-02-10 Thread via GitHub
potiuk merged PR #36882: URL: https://github.com/apache/airflow/pull/36882 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.a