Hey all,
I have sent in a PR and JIRA here:
https://github.com/apache/incubator-airflow/pull/2021
https://issues.apache.org/jira/browse/AIRFLOW-807
Please have a look.
EDIT: I see Arthur just did haha.
Cheers,
Chris
On Tue, Jan 24, 2017 at 9:41 PM, Chris Riccomini
wrote:
> @Max, ran both an
@Max, ran both analyze/optimize. Didn't help. Explain still tries to use
`state` index. :(
On Tue, Jan 24, 2017 at 5:54 AM, Bolke de Bruin wrote:
> I have looked into the issue and it is harmless. What happens is that a
> TaskInstance writes “success” to the database and the monitoring catches
>
I have looked into the issue and it is harmless. What happens is that a
TaskInstance writes “success” to the database and the monitoring catches this
change before the process is exited. It reports wrongly (ie. queued) as
self.task_instance is not updated. I have opened AIRFLOW-798, but will not
Hey Chris,
Could you dive into the below a bit more? I don’t like that the LocalTask job
is saying the external state is set to queued, although it might just be that
the monitoring just does not take into account the queued state which it should
(still I am wondering why it happens after the t
Can you rebuild your indexes and recompute the table's stats and see if the
optimizer is still off tracks?
Assuming InnoDB and from memory:
OPTIMIZE TABLE task_instances;
ANALYZE TABLE task_instances;
Max
On Mon, Jan 23, 2017 at 3:45 PM, Arthur Wiedmer
wrote:
> Maybe we can start with
> " .wi
Maybe we can start with
" .with_hint(TI, 'USE INDEX (PRIMARY)', dialect_name='mysql')"
and see if other databases exhibit the same query plan issue ?
Best,
Arthur
On Mon, Jan 23, 2017 at 3:27 PM, Chris Riccomini
wrote:
> With this patch:
>
> $ git diff
> diff --git a/airflow/jobs.py b/airflow/
With this patch:
$ git diff
diff --git a/airflow/jobs.py b/airflow/jobs.py
index f1de333..9d08e75 100644
--- a/airflow/jobs.py
+++ b/airflow/jobs.py
@@ -544,6 +544,7 @@ class SchedulerJob(BaseJob):
.query(
TI.task_id,
func.max(TI.execution_date).label
OK, it's using `state` instead of PRIMARY. Using PRIMARY with a hint, query
takes .47s. Without hint, 10s. Going to try and patch.
On Mon, Jan 23, 2017 at 2:57 PM, Chris Riccomini
wrote:
> This inner query takes 10s:
>
> SELECT task_instance.task_id AS task_id, max(task_instance.execution_date)
This inner query takes 10s:
SELECT task_instance.task_id AS task_id, max(task_instance.execution_date)
AS max_ti
FROM task_instance
WHERE task_instance.dag_id = 'dag1' AND task_instance.state = 'success' AND
task_instance.task_id IN ('t1', 't2') GROUP BY task_instance.task_id
Explain seems OK:
+
It's this query:
SELECT task_instance.task_id AS task_instance_task_id, task_instance.dag_id
AS task_instance_dag_id, task_instance.execution_date AS
task_instance_execution_date, task_instance.start_date AS
task_instance_start_date, task_instance.end_date AS task_instance_end_date,
task_instance.
Can confirm it's a slow query on task_instance table. Still digging.
Unfortunately, the query is truncated in my UI right now:
SELECT task_instance.task_id AS task_instance_...
On Mon, Jan 23, 2017 at 1:56 PM, Chris Riccomini
wrote:
> Digging. Might be a bit.
>
> On Mon, Jan 23, 2017 at 1:32 PM
Digging. Might be a bit.
On Mon, Jan 23, 2017 at 1:32 PM, Bolke de Bruin wrote:
> Slow query log? Db load?
>
> B.
>
> Verstuurd vanaf mijn iPad
>
> > Op 23 jan. 2017 om 21:59 heeft Chris Riccomini
> het volgende geschreven:
> >
> > Note: 6.5 million TIs in the task_instance table.
> >
> > On Mo
Slow query log? Db load?
B.
Verstuurd vanaf mijn iPad
> Op 23 jan. 2017 om 21:59 heeft Chris Riccomini het
> volgende geschreven:
>
> Note: 6.5 million TIs in the task_instance table.
>
> On Mon, Jan 23, 2017 at 12:58 PM, Chris Riccomini
> wrote:
>
>> Hey Bolke,
>>
>> Re: system usage, it
Note: 6.5 million TIs in the task_instance table.
On Mon, Jan 23, 2017 at 12:58 PM, Chris Riccomini
wrote:
> Hey Bolke,
>
> Re: system usage, it's pretty quiet <5% CPU usage. Mem is almost all free
> as well.
>
> I am thinking that this is DB related, given that it's pausing when
> executing an
Hey Bolke,
Re: system usage, it's pretty quiet <5% CPU usage. Mem is almost all free
as well.
I am thinking that this is DB related, given that it's pausing when
executing an update. Was looking at the update_state method in models.py,
which logs right before the 15s pause.
Cheers,
Chris
On Mon
Hi Chris,
What is the load on your machine? (CPU/IO/MEM) It seems that the executor is
faster in checking the state than the TaskInstance is able to exit itself. No,
I don’t consider it normal, but it was sometimes reported
(https://github.com/apache/incubator-airflow/pull/1821) though not real
Oops, sorry again. I misread what you said (did I mention lack of sleep?)
The parsing of the db.py file takes 15 MINUTES (900+ seconds), as shown in
the `last runtime`. The log lines WITHIN the db.py.log file (from the
scheduler) show 15 second gaps.
On Mon, Jan 23, 2017 at 12:35 PM, Chris Riccomi
Oops, yes, 15 seconds, sorry. Operating without much sleep. :P
On Mon, Jan 23, 2017 at 12:35 PM, Arthur Wiedmer
wrote:
> Chris,
>
> Just double checking, you mean more than 15 seconds not 15 minutes, right?
>
> Best,
> Arthur
>
> On Mon, Jan 23, 2017 at 12:27 PM, Chris Riccomini
> wrote:
>
> >
Also, seeing this in EVERY task that runs:
[2017-01-23 20:26:13,777] {jobs.py:2112} WARNING - State of this
instance has been externally set to queued. Taking the poison pill. So
long.
[2017-01-23 20:26:13,841] {jobs.py:2051} INFO - Task exited with return code 0
All successful tasks are showing
Chris,
Just double checking, you mean more than 15 seconds not 15 minutes, right?
Best,
Arthur
On Mon, Jan 23, 2017 at 12:27 PM, Chris Riccomini
wrote:
> Hey all,
>
> I've upgraded on production. Things seem to be working so far (only been an
> hour), but I am seeing this in the scheduler logs
Hey all,
I've upgraded on production. Things seem to be working so far (only been an
hour), but I am seeing this in the scheduler logs:
File Path PID
RuntimeLast RuntimeLast Run
--
I created:
- AIRFLOW-791: At start up all running dag_runs are being checked, but not fixed
- AIRFLOW-790: DagRuns do not exist for certain tasks, but don’t get fixed
- AIRFLOW-788: Context unexpectedly added to hive conf
- AIRFLOW-792: Allow fixing of schedule when wrong start_date / interval was
Bolke, I will tackle logging because I started integrating with Google
StackDriver logging. I have a separate branch, but this will be for after
1.8. It will be configurable and context aware and everyone will benefit
(not only stackdriver logging).
On Fri, Jan 20, 2017 at 11:55 PM Bolke de Bruin
Will do. And thanks.
Adding another issue:
* Some of our DAGs are not getting scheduled for some unknown reason.
Need to investigate why.
Related but not root cause:
* Logging is so chatty that it gets really hard to find the real issue
Bolke.
> On 20 Jan 2017, at 23:45, Dan Davydov wrote:
>
I'd be happy to lend a hand fixing these issues and hopefully some others
are too. Do you mind creating jiras for these since you have the full
context? I have created a JIRA for (1) and have assigned it to myself:
https://issues.apache.org/jira/browse/AIRFLOW-780
On Fri, Jan 20, 2017 at 1:01 AM,
— continued accidentally pressed send —
This is to report back on some of the (early) experiences we have with Airflow
1.8.0 (beta 1 at the moment):
1. The UI does not show faulty DAG, leading to confusion for developers.
When a faulty dag is placed in the dags folder the UI would report a pars
This is to report back on some of the (early) experiences we have with Airflow
1.8.0 (beta 1 at the moment):
1. The UI does not show faulty DAG, leading to confusion for developers.
When a faulty dag is placed in the dags folder the UI would report a parsing
error. Now it doesn’t due to the sep
27 matches
Mail list logo