Sergei Iakhnin created AIRFLOW-191:
--------------------------------------

             Summary: Database connection leak on Postgresql backend
                 Key: AIRFLOW-191
                 URL: https://issues.apache.org/jira/browse/AIRFLOW-191
             Project: Apache Airflow
          Issue Type: Bug
          Components: executor
    Affects Versions: Airflow 1.7.1
            Reporter: Sergei Iakhnin


I raised this issue on github several months ago and there was even a PR but it 
never maid it into mainline. Basically, workers tend to hang onto DB 
connections in Postgres for recording heartbeat.

I'm running a cluster with 115 workers, each with 8 slots. My Postgres DB is 
configured to allow 1000 simultaneous connections. I should effectively be able 
to run 920 tasks at the same time, but am actually limited to only about 
450-480 because of idle transactions from workers hanging on to DB connections.

If I run the following query

select count(*),state, client_hostname from pg_stat_activity group by state, 
client_hostname

These are the results:
count state client_hostname
1       active  (null)
1       idle    localhost
451     idle in transaction     (null)
446     idle    (null)
1       active  localhost

The idle connections are all trying to run COMMIT
The "idle in transaction" connections are all trying to run 
SELECT job.id AS job_id, job.dag_id AS job_dag_id, job.state AS job_state, 
job.job_type AS job_job_type, job.start_date AS job_start_date, job.end_date AS 
job_end_date, job.latest_heartbeat AS job_latest_heartbeat, job.executor_class 
AS job_executor_class, job.hostname AS job_hostname, job.unixname AS 
job_unixname 
FROM job 
WHERE job.id = 213823 
 LIMIT 1

with differing job.ids of course.







--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to