Caching is a last resort solution and probably not a good thing here. It
would introduce lag and confusion.
You seem to say that some things evaluated twice within a scheduler cycle?
What would that be?
Another option is to reduce the number of database interaction and make
sure indexes are in pl
Scheduler loop times are definitely a concern (at least for Airbnb), and +1
for option 2 as well if it can be implemented correctly. What is important
for me is that we should always be able to easily tell which of the
dependencies are met and which aren't in the event based model.
On Fri, Jun 3,
Hey Paul,
While I recognize the use case for this, I view it as more of a
deployment-related thing. It adds some complexity to Airflow, and I think
it's better suited to be run as part of a deployment system. A Python
script can be written that uses Airflow's models and SQLAlchemy to initiate
and
Hey Jeremiah,
Something that's been floating in my head is a basic assertion script for
DAGs that will validate things are as expected. This can be used to monitor
test DAGs (especially if we do nightly builds). The assertions could be
things like:
* This DAG should have an execution date ever N
Hey Bolke,
> Are scheduler loop times a concern at all?
Yes, I strongly believe that they are. Especially as we add more DAGs/tasks.
I am not a fan of (1). Caching is just going to create cache consistency
issues, and be really annoying to manage, IMO.
I agree that (2) seems more appealing. I c
Hi,
I am looking at speeding up the scheduler. Currently loop times increase with
the amount of tasks in a dag. This is due to TaskInstance.are_depedencies_met
executing several aggregation functions on the database. These calls are
expensive: between 0.05-0.15s per task and for every scheduler
About structuring memory use: we have some major chunks of code set up as
web services. We have a separate machine that runs one service (a
Java-based app) and is limited to running 20 at once so that we can't run
out of ram.
Our installation uses a separate Docker container for each Airflow app.
Note that in general, Airflow isn't designed to run thousands of small
tasks per minute. The celery library on its own does that well without any
oversight from Airflow, though then you miss out on what Airflow has to
provide (complex dependency management, state handling, logging, retries,
...).
Hey,
Had a look at this celery config option, but no luck. Also tried setting
executor to Local executor - same result
Each task takes no more than 0.1 sec but overall time is huge
Thought that it could be due to disabled pickling, enabled it - almost no
change :(
-Original Message-
From
Thanks very much for the help.
It seems I had two errors happening here. First, as Mattias pointed out, I
was doing it wrong with the jinja2.PackageLoader. (It's always
embarrassing to email a dev list when the error is somewhere entirely
different.) I switched to jinja2.FileLoader and it worke
10 matches
Mail list logo