Re: [I] Kubernetes Executor Task Leak [airflow]

2024-05-16 Thread via GitHub
RNHTTR commented on issue #36998: URL: https://github.com/apache/airflow/issues/36998#issuecomment-2116430535 Closed by: #39551 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [I] Kubernetes Executor Task Leak [airflow]

2024-05-16 Thread via GitHub
RNHTTR closed issue #36998: Kubernetes Executor Task Leak URL: https://github.com/apache/airflow/issues/36998 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-

Re: [I] Kubernetes Executor Task Leak [airflow]

2024-05-10 Thread via GitHub
dirrao commented on issue #36998: URL: https://github.com/apache/airflow/issues/36998#issuecomment-2105537338 This issue is related to watcher is not able to scale and process the events on time. This leads to so many completed pods over the time. related: https://github.com/apache/airflo

Re: [I] Kubernetes Executor Task Leak [airflow]

2024-05-10 Thread via GitHub
dirrao commented on issue #36998: URL: https://github.com/apache/airflow/issues/36998#issuecomment-2105535544 This issue is related to watcher is not able to scale and process the events on time. This leads to so many completed pods over the time. related: https://github.com/apache/airflo

Re: [I] Kubernetes Executor Task Leak [airflow]

2024-05-07 Thread via GitHub
RNHTTR commented on issue #36998: URL: https://github.com/apache/airflow/issues/36998#issuecomment-2099360465 > Based on my finding above that the KubernetesJobWatcher was running but not getting back any pod changes, I have added a timeout of 5 min so that watcher restarts itself. This has

Re: [I] Kubernetes Executor Task Leak [airflow]

2024-04-23 Thread via GitHub
karunpoudel-chr commented on issue #36998: URL: https://github.com/apache/airflow/issues/36998#issuecomment-2073367112 Based on my finding above that the KubernetesJobWatcher was running but not getting back any pod changes, I have added a timeout of 5 min so that watcher restarts itself. T

Re: [I] Kubernetes Executor Task Leak [airflow]

2024-04-19 Thread via GitHub
karunpoudel-chr commented on issue #36998: URL: https://github.com/apache/airflow/issues/36998#issuecomment-2066762929 I am seeing issue in single namespace. airflow==2.8.4 apache-airflow-providers-cncf-kubernetes==7.14.0 kubernetes==23.6.0 `KubernetesJobWatcher` failed a coup

Re: [I] Kubernetes Executor Task Leak [airflow]

2024-04-17 Thread via GitHub
paramjeet01 commented on issue #36998: URL: https://github.com/apache/airflow/issues/36998#issuecomment-2062034550 @crabio , Yes we run in single namespace. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [I] Kubernetes Executor Task Leak [airflow]

2024-04-17 Thread via GitHub
crabio commented on issue #36998: URL: https://github.com/apache/airflow/issues/36998#issuecomment-2062032563 @aru-trackunit Sure 1. run Airflow with Kubernetes Executor with 1 scheduler 2. run some tasks in your default namespace (maybe it is not required) and run tasks in another

Re: [I] Kubernetes Executor Task Leak [airflow]

2024-04-17 Thread via GitHub
aru-trackunit commented on issue #36998: URL: https://github.com/apache/airflow/issues/36998#issuecomment-2061511376 @crabio Could you please post steps to reproduce the issue? Then I could spend a little bit more time understanding it -- This is an automated message from the Apache Git S

Re: [I] Kubernetes Executor Task Leak [airflow]

2024-04-17 Thread via GitHub
aru-trackunit commented on issue #36998: URL: https://github.com/apache/airflow/issues/36998#issuecomment-2061097209 I see similarity between the issue we are facing and the one you describe. Airflow 2.8.4 We run one instance of scheduler and we observe a list of completed tasks (u

Re: [I] Kubernetes Executor Task Leak [airflow]

2024-04-16 Thread via GitHub
paramjeet01 commented on issue #36998: URL: https://github.com/apache/airflow/issues/36998#issuecomment-2059534079 @crabio I have updated my comments here https://github.com/apache/airflow/issues/38968#issuecomment-2059521327 , I was able to improve the performance and the task no longer ha

Re: [I] Kubernetes Executor Task Leak [airflow]

2024-04-16 Thread via GitHub
crabio commented on issue #36998: URL: https://github.com/apache/airflow/issues/36998#issuecomment-2058807918 @paramjeet01 Fully - not We still have slots leak: https://github.com/apache/airflow/assets/40871973/82d34707-fe5c-47d1-b1d7-43ab691985ac";> But we found walkaround: 1

Re: [I] Kubernetes Executor Task Leak [airflow]

2024-04-16 Thread via GitHub
paramjeet01 commented on issue #36998: URL: https://github.com/apache/airflow/issues/36998#issuecomment-2058794814 @crabio , Were you able to find a solution ? We are also facing the task leak issue in v2.6.3 -- This is an automated message from the Apache Git Service. To respond to the m

Re: [I] Kubernetes Executor Task Leak [airflow]

2024-04-10 Thread via GitHub
crabio commented on issue #36998: URL: https://github.com/apache/airflow/issues/36998#issuecomment-2046846147 I found workaround and some insights: If scheduler parallelism is less than sum of all slots in pools (32 parallelism and for example 8 pools by 8 slots) and all slots are used -

Re: [I] Kubernetes Executor Task Leak [airflow]

2024-04-10 Thread via GitHub
crabio commented on issue #36998: URL: https://github.com/apache/airflow/issues/36998#issuecomment-2046693119 @smhood I have downgraded Airflow version to 2.7.2, but issue still exists... ``` apache-airflow == 2.7.2 dbt-core == 1.7.11 dbt-snowflake == 1.7.3 apache-airflow[stats

Re: [I] Kubernetes Executor Task Leak [airflow]

2024-04-09 Thread via GitHub
crabio commented on issue #36998: URL: https://github.com/apache/airflow/issues/36998#issuecomment-2044656658 Update: I have the same error with 1 scheduler with Airflow 2.8.4. But I think that error may be also in the kubernetes provider. Libs: ``` apache-airflow == 2.8.4 dbt-

Re: [I] Kubernetes Executor Task Leak [airflow]

2024-04-09 Thread via GitHub
crabio commented on issue #36998: URL: https://github.com/apache/airflow/issues/36998#issuecomment-2044294154 Hi! I had the same error with 2.8.1 described in https://github.com/apache/airflow/issues/36478 and I tested it on 2.8.4 and bug still exists -- This is an automated message fr

Re: [I] Kubernetes Executor Task Leak [airflow]

2024-02-22 Thread via GitHub
bixel commented on issue #36998: URL: https://github.com/apache/airflow/issues/36998#issuecomment-1959360230 It looks like the scheduler or the kubernetes_executor cannot recover from communication issues with kubernetes. I've collected a few hours of logging after a restart of the schedule

Re: [I] Kubernetes Executor Task Leak [airflow]

2024-02-06 Thread via GitHub
smhood commented on issue #36998: URL: https://github.com/apache/airflow/issues/36998#issuecomment-1929961237 ![image](https://github.com/apache/airflow/assets/14971423/da371484-b1c2-40df-8fe2-6a3dc4cb0763) Able to capture a pretty good log of what we are seeing. Things were working

Re: [I] Kubernetes Executor Task Leak [airflow]

2024-02-03 Thread via GitHub
aki263 commented on issue #36998: URL: https://github.com/apache/airflow/issues/36998#issuecomment-1925613030 I am experiencing similar problem in 2.7.3 `[2024-02-04T07:19:20.201+] {scheduler_job_runner.py:1081} DEBUG - Executor full, skipping critical section [2024-02-04T07:1

Re: [I] Kubernetes Executor Task Leak [airflow]

2024-02-02 Thread via GitHub
smhood commented on issue #36998: URL: https://github.com/apache/airflow/issues/36998#issuecomment-1924671652 @dirrao do I have to change labels in order to get follow up? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [I] Kubernetes Executor Task Leak [airflow]

2024-02-02 Thread via GitHub
smhood commented on issue #36998: URL: https://github.com/apache/airflow/issues/36998#issuecomment-1924459837 Additional logging showing what this looks like as far as open slots: ![image](https://github.com/apache/airflow/assets/14971423/3e41b474-2d3c-4dd3-8492-434495a7d1e3) -- Th

Re: [I] Kubernetes Executor Task Leak [airflow]

2024-02-02 Thread via GitHub
smhood commented on issue #36998: URL: https://github.com/apache/airflow/issues/36998#issuecomment-1924436977 Looking into the latest occurrence, what is weird is we are seeing the following logged event: ![image](https://github.com/apache/airflow/assets/14971423/de500855-71d8-4744-b5f3-

Re: [I] Kubernetes Executor Task Leak [airflow]

2024-02-02 Thread via GitHub
smhood commented on issue #36998: URL: https://github.com/apache/airflow/issues/36998#issuecomment-1923838162 So after a brief sighting of things working again, we are now seeing it again. Single Scheduler running on the 1.11 Helm Chart, Airflow 2.8.1. -- This is an automated

Re: [I] Kubernetes Executor Task Leak [airflow]

2024-01-31 Thread via GitHub
smhood commented on issue #36998: URL: https://github.com/apache/airflow/issues/36998#issuecomment-1919164197 So we actually are starting to see things work now potentially. We were utilizing an old version of the helm chart, and after upgrading from 1.10 to 1.11 we are seeing the executors

Re: [I] Kubernetes Executor Task Leak [airflow]

2024-01-25 Thread via GitHub
smhood commented on issue #36998: URL: https://github.com/apache/airflow/issues/36998#issuecomment-1910695514 Also we are creating our own airflow image off and importing our dags there: Dockerfile: ``` FROM apache/airflow:2.8.1-python3.11 USER root RUN apt-get update && apt-ge

Re: [I] Kubernetes Executor Task Leak [airflow]

2024-01-25 Thread via GitHub
smhood commented on issue #36998: URL: https://github.com/apache/airflow/issues/36998#issuecomment-1910559014 We have been seeing this issue basically ever since we upgraded from 2.7.3 -> 2.8.0 (and now on 2.8.1) -- This is an automated message from the Apache Git Service. To respond to t

Re: [I] Kubernetes Executor Task Leak [airflow]

2024-01-25 Thread via GitHub
smhood commented on issue #36998: URL: https://github.com/apache/airflow/issues/36998#issuecomment-1910544611 > are you seeing this issue when you run the airflow with single scheduler? Can you share the details to reprice it? > > This requires triaging. Meanwhile, you can bump up the

Re: [I] Kubernetes Executor Task Leak [airflow]

2024-01-25 Thread via GitHub
dirrao commented on issue #36998: URL: https://github.com/apache/airflow/issues/36998#issuecomment-1910537655 are you seeing this issue when you run the airflow with single scheduler? Can you share the details to reprice it? This requires triaging. Meanwhile, you can bump up the para

Re: [I] Kubernetes Executor Task Leak [airflow]

2024-01-25 Thread via GitHub
smhood commented on issue #36998: URL: https://github.com/apache/airflow/issues/36998#issuecomment-1910522696 Seems like whenever our executor is checking the state of the task instance its not being updated from the database Looking at the database entry, its clearly marked as su

Re: [I] Kubernetes Executor Task Leak [airflow]

2024-01-25 Thread via GitHub
smhood commented on issue #36998: URL: https://github.com/apache/airflow/issues/36998#issuecomment-1910408593 Looking over the logs I get two different outcomes when I restart the pods I get the following: ``` [2024-01-25T13:20:23.100+] {scheduler_job_runner.py:696} INFO - Rece

Re: [I] Kubernetes Executor Task Leak [airflow]

2024-01-24 Thread via GitHub
boring-cyborg[bot] commented on issue #36998: URL: https://github.com/apache/airflow/issues/36998#issuecomment-1908253016 Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for ap

[I] Kubernetes Executor Task Leak [airflow]

2024-01-24 Thread via GitHub
smhood opened a new issue, #36998: URL: https://github.com/apache/airflow/issues/36998 ### Apache Airflow version 2.8.1 ### If "Other Airflow 2 version" selected, which one? _No response_ ### What happened? Scheduler stops processing DAGs and moving them to