Re: [PR] Fix race condition in KubernetesExecutor with concurrently running schedulers [airflow]

2024-04-25 Thread via GitHub
dstandish commented on PR #35800: URL: https://github.com/apache/airflow/pull/35800#issuecomment-2077943123 @changqian9 it looks like orig author may have gotten busy or lost motivation. anyone could pick it up and see it through if motivated. -- This is an automated message from the Apa

Re: [PR] Fix race condition in KubernetesExecutor with concurrently running schedulers [airflow]

2024-04-14 Thread via GitHub
changqian9 commented on PR #35800: URL: https://github.com/apache/airflow/pull/35800#issuecomment-2053953506 Can anyone loot into this PR? In Airflow 2.8.x users face Executor leak issue and I faced the same. Could the above PR fix it? https://github.com/apache/airflow/issues/36998 http

Re: [PR] Fix race condition in KubernetesExecutor with concurrently running schedulers [airflow]

2024-04-13 Thread via GitHub
github-actions[bot] commented on PR #35800: URL: https://github.com/apache/airflow/pull/35800#issuecomment-2053813733 This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for you

Re: [PR] Fix race condition in KubernetesExecutor with concurrently running schedulers [airflow]

2024-02-26 Thread via GitHub
dstandish commented on code in PR #35800: URL: https://github.com/apache/airflow/pull/35800#discussion_r1503406574 ## airflow/providers/cncf/kubernetes/executors/kubernetes_executor.py: ## @@ -642,38 +642,37 @@ def adopt_launched_task( del tis_to_flush_by_key[ti_key]

Re: [PR] Fix race condition in KubernetesExecutor with concurrently running schedulers [airflow]

2024-02-26 Thread via GitHub
dstandish commented on PR #35800: URL: https://github.com/apache/airflow/pull/35800#issuecomment-1965382244 Are there any other open questions on this one? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] Fix race condition in KubernetesExecutor with concurrently running schedulers [airflow]

2024-01-16 Thread via GitHub
eladkal commented on PR #35800: URL: https://github.com/apache/airflow/pull/35800#issuecomment-1893295651 @droppoint can you address @hussein-awala comment? It looks like this PR is almost complete. I hope to get it merged for next release -- This is an automated message from the Apache G

Re: [PR] Fix race condition in KubernetesExecutor with concurrently running schedulers [airflow]

2023-12-23 Thread via GitHub
hussein-awala commented on code in PR #35800: URL: https://github.com/apache/airflow/pull/35800#discussion_r1435740228 ## airflow/providers/cncf/kubernetes/executors/kubernetes_executor.py: ## @@ -642,38 +642,37 @@ def adopt_launched_task( del tis_to_flush_by_key[ti_key

Re: [PR] Fix race condition in KubernetesExecutor with concurrently running schedulers [airflow]

2023-12-21 Thread via GitHub
dirrao commented on code in PR #35800: URL: https://github.com/apache/airflow/pull/35800#discussion_r1434655981 ## airflow/providers/cncf/kubernetes/executors/kubernetes_executor.py: ## @@ -642,38 +642,37 @@ def adopt_launched_task( del tis_to_flush_by_key[ti_key]

Re: [PR] Fix race condition in KubernetesExecutor with concurrently running schedulers [airflow]

2023-12-21 Thread via GitHub
dstandish commented on code in PR #35800: URL: https://github.com/apache/airflow/pull/35800#discussion_r1434388575 ## airflow/providers/cncf/kubernetes/executors/kubernetes_executor.py: ## @@ -642,38 +642,37 @@ def adopt_launched_task( del tis_to_flush_by_key[ti_key]

Re: [PR] Fix race condition in KubernetesExecutor with concurrently running schedulers [airflow]

2023-12-21 Thread via GitHub
dirrao commented on code in PR #35800: URL: https://github.com/apache/airflow/pull/35800#discussion_r1434146117 ## airflow/providers/cncf/kubernetes/executors/kubernetes_executor.py: ## @@ -642,38 +642,37 @@ def adopt_launched_task( del tis_to_flush_by_key[ti_key]

Re: [PR] Fix race condition in KubernetesExecutor with concurrently running schedulers [airflow]

2023-12-21 Thread via GitHub
droppoint commented on code in PR #35800: URL: https://github.com/apache/airflow/pull/35800#discussion_r1434158338 ## airflow/providers/cncf/kubernetes/executors/kubernetes_executor.py: ## @@ -642,38 +642,37 @@ def adopt_launched_task( del tis_to_flush_by_key[ti_key]

Re: [PR] Fix race condition in KubernetesExecutor with concurrently running schedulers [airflow]

2023-12-21 Thread via GitHub
dirrao commented on code in PR #35800: URL: https://github.com/apache/airflow/pull/35800#discussion_r1434146117 ## airflow/providers/cncf/kubernetes/executors/kubernetes_executor.py: ## @@ -642,38 +642,37 @@ def adopt_launched_task( del tis_to_flush_by_key[ti_key]

Re: [PR] Fix race condition in KubernetesExecutor with concurrently running schedulers [airflow]

2023-12-21 Thread via GitHub
potiuk commented on code in PR #35800: URL: https://github.com/apache/airflow/pull/35800#discussion_r1434054803 ## airflow/providers/cncf/kubernetes/executors/kubernetes_executor.py: ## @@ -642,38 +642,37 @@ def adopt_launched_task( del tis_to_flush_by_key[ti_key]

Re: [PR] Fix race condition in KubernetesExecutor with concurrently running schedulers [airflow]

2023-12-21 Thread via GitHub
droppoint commented on code in PR #35800: URL: https://github.com/apache/airflow/pull/35800#discussion_r1434050571 ## airflow/providers/cncf/kubernetes/executors/kubernetes_executor.py: ## @@ -642,38 +642,37 @@ def adopt_launched_task( del tis_to_flush_by_key[ti_key]

Re: [PR] Fix race condition in KubernetesExecutor with concurrently running schedulers [airflow]

2023-12-21 Thread via GitHub
droppoint commented on code in PR #35800: URL: https://github.com/apache/airflow/pull/35800#discussion_r1434050571 ## airflow/providers/cncf/kubernetes/executors/kubernetes_executor.py: ## @@ -642,38 +642,37 @@ def adopt_launched_task( del tis_to_flush_by_key[ti_key]

Re: [PR] Fix race condition in KubernetesExecutor with concurrently running schedulers [airflow]

2023-12-21 Thread via GitHub
dirrao commented on code in PR #35800: URL: https://github.com/apache/airflow/pull/35800#discussion_r1434044047 ## airflow/providers/cncf/kubernetes/executors/kubernetes_executor.py: ## @@ -642,38 +642,37 @@ def adopt_launched_task( del tis_to_flush_by_key[ti_key]

Re: [PR] Fix race condition in KubernetesExecutor with concurrently running schedulers [airflow]

2023-12-21 Thread via GitHub
dirrao commented on code in PR #35800: URL: https://github.com/apache/airflow/pull/35800#discussion_r1434039826 ## airflow/providers/cncf/kubernetes/executors/kubernetes_executor.py: ## @@ -642,38 +642,37 @@ def adopt_launched_task( del tis_to_flush_by_key[ti_key]

Re: [PR] Fix race condition in KubernetesExecutor with concurrently running schedulers [airflow]

2023-12-21 Thread via GitHub
droppoint commented on code in PR #35800: URL: https://github.com/apache/airflow/pull/35800#discussion_r143469 ## airflow/providers/cncf/kubernetes/executors/kubernetes_executor.py: ## @@ -642,38 +642,37 @@ def adopt_launched_task( del tis_to_flush_by_key[ti_key]

Re: [PR] Fix race condition in KubernetesExecutor with concurrently running schedulers [airflow]

2023-12-21 Thread via GitHub
potiuk commented on code in PR #35800: URL: https://github.com/apache/airflow/pull/35800#discussion_r1433937755 ## airflow/providers/cncf/kubernetes/executors/kubernetes_executor.py: ## @@ -642,38 +642,37 @@ def adopt_launched_task( del tis_to_flush_by_key[ti_key]

Re: [PR] Fix race condition in KubernetesExecutor with concurrently running schedulers [airflow]

2023-12-21 Thread via GitHub
droppoint commented on code in PR #35800: URL: https://github.com/apache/airflow/pull/35800#discussion_r1433930848 ## airflow/providers/cncf/kubernetes/executors/kubernetes_executor.py: ## @@ -642,38 +642,37 @@ def adopt_launched_task( del tis_to_flush_by_key[ti_key]

Re: [PR] Fix race condition in KubernetesExecutor with concurrently running schedulers [airflow]

2023-12-21 Thread via GitHub
dirrao commented on code in PR #35800: URL: https://github.com/apache/airflow/pull/35800#discussion_r1433776152 ## airflow/providers/cncf/kubernetes/executors/kubernetes_executor.py: ## @@ -642,38 +642,37 @@ def adopt_launched_task( del tis_to_flush_by_key[ti_key]

Re: [PR] Fix race condition in KubernetesExecutor with concurrently running schedulers [airflow]

2023-12-21 Thread via GitHub
droppoint commented on code in PR #35800: URL: https://github.com/apache/airflow/pull/35800#discussion_r1433692335 ## airflow/providers/cncf/kubernetes/executors/kubernetes_executor.py: ## @@ -642,38 +642,37 @@ def adopt_launched_task( del tis_to_flush_by_key[ti_key]

Re: [PR] Fix race condition in KubernetesExecutor with concurrently running schedulers [airflow]

2023-12-21 Thread via GitHub
dirrao commented on code in PR #35800: URL: https://github.com/apache/airflow/pull/35800#discussion_r1433693460 ## airflow/providers/cncf/kubernetes/executors/kubernetes_executor.py: ## @@ -642,38 +642,37 @@ def adopt_launched_task( del tis_to_flush_by_key[ti_key]

Re: [PR] Fix race condition in KubernetesExecutor with concurrently running schedulers [airflow]

2023-12-21 Thread via GitHub
droppoint commented on code in PR #35800: URL: https://github.com/apache/airflow/pull/35800#discussion_r1433692335 ## airflow/providers/cncf/kubernetes/executors/kubernetes_executor.py: ## @@ -642,38 +642,37 @@ def adopt_launched_task( del tis_to_flush_by_key[ti_key]

Re: [PR] Fix race condition in KubernetesExecutor with concurrently running schedulers [airflow]

2023-12-20 Thread via GitHub
droppoint commented on code in PR #35800: URL: https://github.com/apache/airflow/pull/35800#discussion_r1433671156 ## airflow/providers/cncf/kubernetes/executors/kubernetes_executor.py: ## @@ -642,38 +642,37 @@ def adopt_launched_task( del tis_to_flush_by_key[ti_key]

Re: [PR] Fix race condition in KubernetesExecutor with concurrently running schedulers [airflow]

2023-12-20 Thread via GitHub
dirrao commented on code in PR #35800: URL: https://github.com/apache/airflow/pull/35800#discussion_r1433383053 ## airflow/providers/cncf/kubernetes/executors/kubernetes_executor.py: ## @@ -642,38 +642,37 @@ def adopt_launched_task( del tis_to_flush_by_key[ti_key]

Re: [PR] Fix race condition in KubernetesExecutor with concurrently running schedulers [airflow]

2023-12-20 Thread via GitHub
dirrao commented on code in PR #35800: URL: https://github.com/apache/airflow/pull/35800#discussion_r1433383053 ## airflow/providers/cncf/kubernetes/executors/kubernetes_executor.py: ## @@ -642,38 +642,37 @@ def adopt_launched_task( del tis_to_flush_by_key[ti_key]

Re: [PR] Fix race condition in KubernetesExecutor with concurrently running schedulers [airflow]

2023-12-20 Thread via GitHub
dirrao commented on code in PR #35800: URL: https://github.com/apache/airflow/pull/35800#discussion_r1433383053 ## airflow/providers/cncf/kubernetes/executors/kubernetes_executor.py: ## @@ -642,38 +642,37 @@ def adopt_launched_task( del tis_to_flush_by_key[ti_key]

Re: [PR] Fix race condition in KubernetesExecutor with concurrently running schedulers [airflow]

2023-12-20 Thread via GitHub
dirrao commented on code in PR #35800: URL: https://github.com/apache/airflow/pull/35800#discussion_r1433383053 ## airflow/providers/cncf/kubernetes/executors/kubernetes_executor.py: ## @@ -642,38 +642,37 @@ def adopt_launched_task( del tis_to_flush_by_key[ti_key]

Re: [PR] Fix race condition in KubernetesExecutor with concurrently running schedulers [airflow]

2023-12-20 Thread via GitHub
dirrao commented on code in PR #35800: URL: https://github.com/apache/airflow/pull/35800#discussion_r1433383053 ## airflow/providers/cncf/kubernetes/executors/kubernetes_executor.py: ## @@ -642,38 +642,37 @@ def adopt_launched_task( del tis_to_flush_by_key[ti_key]

Re: [PR] Fix race condition in KubernetesExecutor with concurrently running schedulers [airflow]

2023-12-20 Thread via GitHub
dirrao commented on PR #35800: URL: https://github.com/apache/airflow/pull/35800#issuecomment-1864888787 > Nice findings. Looks promising. Thanks for that. > > Ok here's another scenario. > > The task OOMs and therefore cannot report its state by itself. I believe in this

Re: [PR] Fix race condition in KubernetesExecutor with concurrently running schedulers [airflow]

2023-12-20 Thread via GitHub
dirrao commented on code in PR #35800: URL: https://github.com/apache/airflow/pull/35800#discussion_r1432987474 ## airflow/providers/cncf/kubernetes/executors/kubernetes_executor.py: ## @@ -642,38 +642,37 @@ def adopt_launched_task( del tis_to_flush_by_key[ti_key]

Re: [PR] Fix race condition in KubernetesExecutor with concurrently running schedulers [airflow]

2023-12-20 Thread via GitHub
dirrao commented on code in PR #35800: URL: https://github.com/apache/airflow/pull/35800#discussion_r1432987474 ## airflow/providers/cncf/kubernetes/executors/kubernetes_executor.py: ## @@ -642,38 +642,37 @@ def adopt_launched_task( del tis_to_flush_by_key[ti_key]

Re: [PR] Fix race condition in KubernetesExecutor with concurrently running schedulers [airflow]

2023-12-20 Thread via GitHub
potiuk commented on PR #35800: URL: https://github.com/apache/airflow/pull/35800#issuecomment-1864650545 Hey @hussein-awala @dstandish -> would love to get that one merged, maybe you two can take a look. I will **just** be relasing also cncf.k8s provider I think, and I think together with #

Re: [PR] Fix race condition in KubernetesExecutor with concurrently running schedulers [airflow]

2023-12-20 Thread via GitHub
droppoint commented on code in PR #35800: URL: https://github.com/apache/airflow/pull/35800#discussion_r1432808345 ## airflow/providers/cncf/kubernetes/executors/kubernetes_executor.py: ## @@ -642,39 +641,6 @@ def adopt_launched_task( del tis_to_flush_by_key[ti_key]

Re: [PR] Fix race condition in KubernetesExecutor with concurrently running schedulers [airflow]

2023-12-07 Thread via GitHub
droppoint commented on PR #35800: URL: https://github.com/apache/airflow/pull/35800#issuecomment-1845545959 We've refactored the _adopt_completed_pods function to the _delete_orphaned_completed_pods function and now it removes completed pods from failed schedulers properly. Here's a

Re: [PR] Fix race condition in KubernetesExecutor with concurrently running schedulers [airflow]

2023-11-30 Thread via GitHub
dstandish commented on PR #35800: URL: https://github.com/apache/airflow/pull/35800#issuecomment-1834060063 Nice findings. Looks promising. Thanks for that. Ok here's another scenario. The task OOMs and therefore cannot report its state by itself. -- This is an automated mes

Re: [PR] Fix race condition in KubernetesExecutor with concurrently running schedulers [airflow]

2023-11-30 Thread via GitHub
droppoint commented on PR #35800: URL: https://github.com/apache/airflow/pull/35800#issuecomment-1833844894 > Hi @droppoint let us know what you find My team and I ran an experiment that demonstrated that even if the scheduler shuts down abnormally, the TaskInstance still completes no

Re: [PR] Fix race condition in KubernetesExecutor with concurrently running schedulers [airflow]

2023-11-28 Thread via GitHub
potiuk commented on PR #35800: URL: https://github.com/apache/airflow/pull/35800#issuecomment-1831115251 > Hi @droppoint let us know what you find yeah. I might even some time to follow the discussion and contribute to discussion as well soon. -- This is an automated message from t

Re: [PR] Fix race condition in KubernetesExecutor with concurrently running schedulers [airflow]

2023-11-28 Thread via GitHub
dstandish commented on PR #35800: URL: https://github.com/apache/airflow/pull/35800#issuecomment-1830161663 > The completed pods are not a problem because they consume no resources, and they will be deleted during the airflow cleanup-pods cronjob execution. However, a TaskInstance can get s

Re: [PR] Fix race condition in KubernetesExecutor with concurrently running schedulers [airflow]

2023-11-24 Thread via GitHub
JCoder01 commented on code in PR #35800: URL: https://github.com/apache/airflow/pull/35800#discussion_r1404509646 ## airflow/providers/cncf/kubernetes/executors/kubernetes_executor.py: ## @@ -642,39 +641,6 @@ def adopt_launched_task( del tis_to_flush_by_key[ti_key]

Re: [PR] Fix race condition in KubernetesExecutor with concurrently running schedulers [airflow]

2023-11-24 Thread via GitHub
droppoint commented on code in PR #35800: URL: https://github.com/apache/airflow/pull/35800#discussion_r1404083513 ## airflow/providers/cncf/kubernetes/executors/kubernetes_executor.py: ## @@ -642,39 +641,6 @@ def adopt_launched_task( del tis_to_flush_by_key[ti_key]

Re: [PR] Fix race condition in KubernetesExecutor with concurrently running schedulers [airflow]

2023-11-24 Thread via GitHub
ephraimbuddy commented on code in PR #35800: URL: https://github.com/apache/airflow/pull/35800#discussion_r1404052934 ## airflow/providers/cncf/kubernetes/executors/kubernetes_executor.py: ## @@ -642,39 +641,6 @@ def adopt_launched_task( del tis_to_flush_by_key[ti_key]

Re: [PR] Fix race condition in KubernetesExecutor with concurrently running schedulers [airflow]

2023-11-24 Thread via GitHub
ephraimbuddy commented on code in PR #35800: URL: https://github.com/apache/airflow/pull/35800#discussion_r1404052934 ## airflow/providers/cncf/kubernetes/executors/kubernetes_executor.py: ## @@ -642,39 +641,6 @@ def adopt_launched_task( del tis_to_flush_by_key[ti_key]

Re: [PR] Fix race condition in KubernetesExecutor with concurrently running schedulers [airflow]

2023-11-23 Thread via GitHub
JCoder01 commented on code in PR #35800: URL: https://github.com/apache/airflow/pull/35800#discussion_r1403249411 ## airflow/providers/cncf/kubernetes/executors/kubernetes_executor.py: ## @@ -642,39 +641,6 @@ def adopt_launched_task( del tis_to_flush_by_key[ti_key]

Re: [PR] Fix race condition in KubernetesExecutor with concurrently running schedulers [airflow]

2023-11-23 Thread via GitHub
dstandish commented on PR #35800: URL: https://github.com/apache/airflow/pull/35800#issuecomment-1824655540 In case it is important to adopt pods, but we need a fix to do it safely, just throwing out some ideas before i disappear for holiday. The scheduler can know which other schedul

Re: [PR] Fix race condition in KubernetesExecutor with concurrently running schedulers [airflow]

2023-11-23 Thread via GitHub
droppoint commented on PR #35800: URL: https://github.com/apache/airflow/pull/35800#issuecomment-1824311383 The completed pods are not a problem because they consume no resources, and they will be deleted during the airflow cleanup-pods cronjob execution. However, a TaskInstance can get stu

Re: [PR] Fix race condition in KubernetesExecutor with concurrently running schedulers [airflow]

2023-11-22 Thread via GitHub
dstandish commented on PR #35800: URL: https://github.com/apache/airflow/pull/35800#issuecomment-1823192097 Ok so the fix is, don't try to adopt completed pods. You mention that not big consequence if we don't do this because we can periodically delete completed pods in cron. B

Re: [PR] Fix race condition in KubernetesExecutor with concurrently running schedulers [airflow]

2023-11-22 Thread via GitHub
boring-cyborg[bot] commented on PR #35800: URL: https://github.com/apache/airflow/pull/35800#issuecomment-1823024487 Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution

[PR] Fix race condition in KubernetesExecutor with concurrently running schedulers [airflow]

2023-11-22 Thread via GitHub
droppoint opened a new pull request, #35800: URL: https://github.com/apache/airflow/pull/35800 Closes: #32928 A race condition occurs in the _adopt_completed_pods function when schedulers are running concurrently. _adopt_completed_pods function doesn't keep track of which scheduler w