Sergei Iakhnin created AIRFLOW-191: -------------------------------------- Summary: Database connection leak on Postgresql backend Key: AIRFLOW-191 URL: https://issues.apache.org/jira/browse/AIRFLOW-191 Project: Apache Airflow Issue Type: Bug Components: executor Affects Versions: Airflow 1.7.1 Reporter: Sergei Iakhnin
I raised this issue on github several months ago and there was even a PR but it never maid it into mainline. Basically, workers tend to hang onto DB connections in Postgres for recording heartbeat. I'm running a cluster with 115 workers, each with 8 slots. My Postgres DB is configured to allow 1000 simultaneous connections. I should effectively be able to run 920 tasks at the same time, but am actually limited to only about 450-480 because of idle transactions from workers hanging on to DB connections. If I run the following query select count(*),state, client_hostname from pg_stat_activity group by state, client_hostname These are the results: count state client_hostname 1 active (null) 1 idle localhost 451 idle in transaction (null) 446 idle (null) 1 active localhost The idle connections are all trying to run COMMIT The "idle in transaction" connections are all trying to run SELECT job.id AS job_id, job.dag_id AS job_dag_id, job.state AS job_state, job.job_type AS job_job_type, job.start_date AS job_start_date, job.end_date AS job_end_date, job.latest_heartbeat AS job_latest_heartbeat, job.executor_class AS job_executor_class, job.hostname AS job_hostname, job.unixname AS job_unixname FROM job WHERE job.id = 213823 LIMIT 1 with differing job.ids of course. -- This message was sent by Atlassian JIRA (v6.3.4#6332)