Re: High load in CPU of MySQL when running airflow

2017-03-08 Thread Maxime Beauchemin
I think you had the finger on it. If you have a frequent query against a large-ish table that cannot leverage an index, that will result in a lot of workload. If I was in your shoes I'd run a CREATE INDEX statement against that table/field and see how it reduces your resource consumptions and make

Re: High load in CPU of MySQL when running airflow

2017-03-08 Thread Jason Chen
Thanks for the reply. We are using 1.7.1.3 and it looks the index is not there. https://github.com/apache/incubator-airflow/blob/1.7.1.3/airflow/models.py#L660-#L664 Is Airflow 1.8 officially released ? I saw the version tag and discussion, but not saw it in pypi.. I did run Dan's SQL statement

Re: High load in CPU of MySQL when running airflow

2017-03-08 Thread Maxime Beauchemin
Wait. That field does have an index and it looks like Dan added it 8 months ago. https://github.com/apache/incubator-airflow/blame/master/airflow/models.py#L744 Here's the related DB migration script: https://github.com/apache/incubator-airflow/blob/master/airflow/migrations/versions/211e584da130_

Re: High load in CPU of MySQL when running airflow

2017-03-07 Thread Dan Davydov
We will need to come up with a plan soon (better DB indexes and/or the ability to rotate out old task instances according to some policy). Nothing concrete as of yet though. On Tue, Mar 7, 2017 at 6:18 PM, Jason Chen wrote: > Hi Dan, > > Thanks so much. This is exactly what I am looking for. >

Re: High load in CPU of MySQL when running airflow

2017-03-07 Thread Jason Chen
Hi Dan, Thanks so much. This is exactly what I am looking for. Is there a plan on the future airflow road map to clean this up from Airflow system level? Say, in airflow.cfg, a setting to clean up data older than specified time. Your solution is to run an airflow job to clean up the data. That'

Re: High load in CPU of MySQL when running airflow

2017-03-07 Thread Dan Davydov
FWIW we use the following DAG at Airbnb to reap the task instances table (this is a stopgap): # DAG to delete old TIs so that UI operations on the webserver are fast. This DAG is a # stopgap, ideally we would make the UI not query all task instances and add indexes to # the task_instance table whe

Re: High load in CPU of MySQL when running airflow

2017-03-07 Thread Jason Chen
Hi Bolke, Thanks, but it looks you are actually talking about Harish's use case. My use case is about 50 Dags (each one with about 2-3 tasks). I feel our run interval setting for the dags are too low (~15 mins). It may result in high CPU of MySQL. Meanwhile, I dig to MySQL and I noticed a fre

Re: High load in CPU of MySQL when running airflow

2017-03-07 Thread Bolke de Bruin
Hi Jason I think you need to back it up with more numbers. You assume that a load of 100% is bad and also that 16GB of mem is a lot. 30x25 = 750 tasks per hour = 12,5 tasks per minute. For every task we launch a couple of processes (at least 2) that do not share memory, this is to ensure tasks

Re: High load in CPU of MySQL when running airflow

2017-03-07 Thread Jason Chen
I see. Thanks. Airflow team, I noticed a frequently running SQL as below. It's without proper index on column task_instance.state. Shouldn't it index "state", given that there could be million of rows in task_instance? "SELECT task_instance.task_id AS task_instance_task_id, task_instance.dag_id A

Re: High load in CPU of MySQL when running airflow

2017-03-07 Thread harish singh
it does and does not. say, scheduler heartbeat = 30 sec You will see a spiky cpu consumption graph every 30 seconds. But we did not go that route and kept the scheduler heartbeat = 5 sec so that we do not lose time when a task is ready to run (I think there is another known bug here - tasks dont

Re: High load in CPU of MySQL when running airflow

2017-03-07 Thread Jason Chen
Hi Harish, Thanks for the fast response and feedback. Yeah, I want to see the fix or more discussion ! BTW, I assume that, given your 30 dags, airflow runs fine after your increase of heartbeat ? The default is 5 secs. Thanks. Jason On Tue, Mar 7, 2017 at 10:24 AM, harish singh wrote: > I

Re: High load in CPU of MySQL when running airflow

2017-03-07 Thread harish singh
I had seen a similar behavior, a year ago, when we were are < 5 Dags. Even then the cpu utilization was reaching 100%. One way to deal with this is - You could play with "heatbeat" numbers (i.e increase heartbeat). But then you are introducing more delay to start jobs that are ready to run (ready t